Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Unity Catalog to fetch catalog/schema/table metadata from System tables #1022

Merged

Conversation

mars-lan
Copy link
Contributor

@mars-lan mars-lan commented Oct 25, 2024

🤔 Why?

Fetching metadata from system.information_schema is preferred over individual catalog's information_schema, as the former doesn't require granting SELECT permission to all tables.

Note that SELECT permission is still required to retrieve a table's properties & last refresh date as they're not available from various system tables.

🤓 What?

  • Replace all REST API calls with queries against system.information_schema. The only exception is IAM-related metadata, which is only available via REST APIs.
  • Consolidate all queries into queries.py to make testing easier and better organization.
  • Update databricks dependencies to the latest version

🧪 Tested?

Tested end-to-end against a product instance. Verified that the before & after MCEs are the same except:

Columns

  • precision for date-related types is set.
  • nullable & tag are set.
  • Use upper case for nativeType.

Table properties

  • Missing Spark-related properties as they're not available via the SHOW TBLPROPERTIES command.
  • Use the property's raw values instead of JSON-encoded strings.

Hierarchy:

  • Set createdAtSource, createdBy, lastUpdated, updatedBy in sourceInfo
  • Set systemContacts & systemDescription

☑️ Checks

  • My PR contains actual code changes, and I have updated the version number in pyproject.toml.

@mars-lan mars-lan changed the title Refactor Unity Catalog to fetch catalog/schema/table metadata from Sy… Refactor Unity Catalog to fetch catalog/schema/table metadata from System tables Oct 25, 2024
@mars-lan mars-lan force-pushed the marslan/sc-29346/use-system-information-schema-in-unity-catalog branch 2 times, most recently from 76c7f35 to 21d58eb Compare October 28, 2024 00:12
Copy link

github-actions bot commented Oct 28, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
13282 11863 89% 85% 🟢

New Files

File Coverage Status
metaphor/unity_catalog/queries.py 97% 🟢
TOTAL 97% 🟢

Modified Files

File Coverage Status
metaphor/unity_catalog/extractor.py 95% 🟢
metaphor/unity_catalog/models.py 100% 🟢
metaphor/unity_catalog/profile/extractor.py 93% 🟢
metaphor/unity_catalog/utils.py 87% 🟢
TOTAL 94% 🟢

updated for commit: 51ebe03 by action🐍

Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 96.75516% with 11 lines in your changes missing coverage. Please review.

Project coverage is 89.31%. Comparing base (db18615) to head (51ebe03).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
metaphor/unity_catalog/extractor.py 95.83% 4 Missing ⚠️
metaphor/unity_catalog/queries.py 97.41% 4 Missing ⚠️
metaphor/unity_catalog/utils.py 89.28% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1022      +/-   ##
==========================================
+ Coverage   89.26%   89.31%   +0.05%     
==========================================
  Files         202      203       +1     
  Lines       13194    13282      +88     
==========================================
+ Hits        11777    11863      +86     
- Misses       1417     1419       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mars-lan mars-lan force-pushed the marslan/sc-29346/use-system-information-schema-in-unity-catalog branch 6 times, most recently from 5a70db7 to 78b6e88 Compare October 28, 2024 04:32
@mars-lan mars-lan force-pushed the marslan/sc-29346/use-system-information-schema-in-unity-catalog branch from 78b6e88 to c485a49 Compare October 28, 2024 04:44
@mars-lan mars-lan marked this pull request as ready for review October 28, 2024 04:45
@mars-lan mars-lan enabled auto-merge (squash) October 28, 2024 04:47
metaphor/unity_catalog/extractor.py Show resolved Hide resolved
metaphor/unity_catalog/extractor.py Show resolved Hide resolved
metaphor/unity_catalog/queries.py Outdated Show resolved Hide resolved
Copy link
Contributor

@usefulalgorithm usefulalgorithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update pyproject.toml and it's good

@mars-lan mars-lan merged commit 955ff89 into main Oct 29, 2024
6 checks passed
@mars-lan mars-lan deleted the marslan/sc-29346/use-system-information-schema-in-unity-catalog branch October 29, 2024 01:55
usefulalgorithm added a commit that referenced this pull request Oct 30, 2024
# This is the 1st commit message:

fix

add test

Refactor Unity Catalog to fetch catalog/schema/table metadata from System tables (#1022)

# This is the commit message #2:

fix stuff
usefulalgorithm added a commit that referenced this pull request Oct 30, 2024
fix

add test

Refactor Unity Catalog to fetch catalog/schema/table metadata from System tables (#1022)

fix stuff

finish test

finish test

add docs

bump version

fix lock

fix ci

Delete tests/great_expectations/snowflake/config.yml

add git ignore

remove everything
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants