This connector extracts technical metadata from Unity Catalog using the Unity Catalog API.
Create an access token in the Databrick workspace > User setting
> Developer
> Access tokens
.
To extract data lineage from Unity Catalog, you'll need to enable system.access schema and grant required permissions to the user. Please also read and understand the feature's limitations.
Make sure to grant the user BROWSE (or SELECT) privilege to all tables in order to retrieve the complete lineage graph. See this section for more details.
Create a YAML config file based on the following template.
hostname: <cluster_or_warehouse_hostname>
http_path: <http_path>
token: <access_token>
See this page for details on how to set the values for hostname
and http_path
.
See Output Config for more information.
See Filter Configurations for more information on the optional filter
config.
By default, each table is associated with a Unity Catalog URL derived from the hostname
config.
You can override this by specifying your own URL built from the catalog, schema, and table names:
source_url: https://example.com/view/{catalog}/{schema}/{table}
By default, the Unity Catalog connector will fetch a full day's query logs from yesterday, to be analyzed for additional metadata, such as dataset usage and lineage information. To backfill log data, one can set lookback_days
to the desired value. To turn off query log fetching, set lookback_days
to 0.
query_log:
# (Optional) Number of days of query logs to fetch. Default to 1. If 0, the no query logs will be fetched.
lookback_days: <days>
# (Optional) A list of users whose queries will be excluded from the log fetching.
excluded_usernames:
- <user_name1>
- <user_name2>
# (Optional) Limit the number of results returned in one page of query log history. The default is 100.
max_results: <count>
See Process Query for more information on the optional process_query_config
config.
Note: we encourage using cluster, this connector will deprecate the SQL warehouse support.
To run the queries using a specific warehouse, simply add its ID in the configuration file:
warehouse_id: <warehouse_id>
If no warehouse id nor cluster path is provided, the connector automatically uses the first discovered warehouse.
Follow the Installation instructions to install metaphor-connectors
in your environment (or virtualenv). Make sure to include either all
or unity_catalog
extra.
Run the following command to test the connector locally:
metaphor unity_catalog <config_file>
Manually verify the output after the command finishes.