Skip to content

Latest commit

 

History

History

profile

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Unity Catalog Data Profiling Connector

This connector extracts dataset-level data profiles from Unity Catalog using the Unity Catalog API.

Setup

Create a dedicated access token based on the Setup guide for the general Unity Catalog connector. You'll need to ensure the owner of the access token has SELECT privilege for the tables in order to analyze the table statistics:

GRANT SELECT ON TABLE * TO <user_role>

Config File

Create a YAML config file based on the following template.

Required Configurations

hostname: <cluster_or_warehouse_hostname>
http_path: <http_path>
token: <access_token>

See this page for details on how to set the values for hostname and http_path.

Optional Configurations

See Filter Configurations for more information on the optional filter config.

Output Destination

See Output Config for more information on the optional output config.

Concurrency

The max number of concurrent queries to the databricks compute node can be configured as follows,

max_concurrency: <max_number_of_queries> # Default to 10

Analyze table

To run ANALYZE TABLE query if there are not statistics for the table.

analyze_if_no_statistics: true # Default is false

Testing

Follow the Installation instructions to install metaphor-connectors in your environment (or virtualenv). Make sure to include either all or unity_catalog extra.

Run the following command to test the connector locally:

metaphor unity_catalog.profile <config_file>

Manually verify the output after the command finishes.