Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sc-28901] Validate query logs as they're parsed #986

Closed

Conversation

usefulalgorithm
Copy link
Contributor

@usefulalgorithm usefulalgorithm commented Sep 23, 2024

🤔 Why?

We should validate the query logs as they're parsed, otherwise bugs are harder to find when they get to the lineage parser.

🤓 What?

Validate the parsed query logs before writing them to file.

Tested this against our own Snowflake instance (13,589 queries)

  • No validation: 36 secs
截圖 2024-09-23 中午12 53 10
  • Validated: 1 min 37 secs
截圖 2024-09-23 中午12 51 24

🧪 Tested?

Tested against

  • BigQuery
  • Snowflake (see above)
  • Unity Catalog

☑️ Checks

  • My PR contains actual code changes, and I have updated the version number in pyproject.toml.

Copy link

github-actions bot commented Sep 23, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
12578 11242 89% 85% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
metaphor/common/file_sink.py 98% 🟢
TOTAL 98% 🟢

updated for commit: cca070e by action🐍

Copy link

codecov bot commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.37%. Comparing base (167ccb7) to head (cca070e).

Files with missing lines Patch % Lines
metaphor/common/file_sink.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #986      +/-   ##
==========================================
- Coverage   89.38%   89.37%   -0.01%     
==========================================
  Files         191      191              
  Lines       12575    12578       +3     
==========================================
+ Hits        11240    11242       +2     
- Misses       1335     1336       +1     
Flag Coverage Δ
89.37% <75.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@elic-eon elic-eon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we validate them when ingestion?

@usefulalgorithm
Copy link
Contributor Author

Didn't we validate them when ingestion?

No, ingestion does not look at what's in the query log files.

@elic-eon
Copy link
Contributor

Didn't we validate them when ingestion?

No, ingestion does not look at what's in the query log files.

Got it, can we also valid entities in the MCE as well? We encounter the MCE validation error when ingestion many times, if we can catch the error earlier, it is better.

@usefulalgorithm
Copy link
Contributor Author

Didn't we validate them when ingestion?

No, ingestion does not look at what's in the query log files.

Got it, can we also valid entities in the MCE as well? We encounter the MCE validation error when ingestion many times, if we can catch the error earlier, it is better.

MCEs are partially validated: https://github.com/MetaphorData/connectors/blob/main/metaphor/common/sink.py#L23C1-L28C14

It only validates against the json schema though, so it's kind of limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants