Skip to content

Commit

Permalink
[sc-27944] Add config to disable tag collection in Snowflake (#992)
Browse files Browse the repository at this point in the history
  • Loading branch information
usefulalgorithm authored Oct 1, 2024
1 parent c58fb73 commit 576d867
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 3 deletions.
8 changes: 8 additions & 0 deletions metaphor/snowflake/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,14 @@ account_usage_schema: <db_name>.<schema_name>

See [Tag Matcher Config](../common/docs/tag_matcher.md) for more information on the optional `tag_matcher` config.

#### Disable Platform Tags Collection

To stop the crawler from collecting platform tags from Snowflake, set `collect_tags` to `False`:

```yaml
collect_tags: false # Default is true.
```

#### Query Logs

By default, the snowflake connector will fetch a full day's query logs from yesterday, to be analyzed for additional metadata, such as dataset usage and lineage information. To backfill log data, one can set `lookback_days` to the desired value. To turn off query log fetching, set `lookback_days` to 0.
Expand Down
3 changes: 3 additions & 0 deletions metaphor/snowflake/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ class SnowflakeBaseConfig(SnowflakeAuthConfig):
# The fully qualified schema that contains all the account_usage views
account_usage_schema: str = "SNOWFLAKE.ACCOUNT_USAGE"

# Whether to collect platform tags.
collect_tags: bool = True


@dataclass(config=ConnectorConfig)
class SnowflakeConfig(SnowflakeBaseConfig):
Expand Down
9 changes: 7 additions & 2 deletions metaphor/snowflake/extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,10 @@ async def extract(self) -> Collection[ENTITY_TYPES]:

self._fetch_primary_keys(cursor)
self._fetch_unique_keys(cursor)
self._fetch_tags(cursor)

# Only fetch the tags when collect_tags is True
if self._config.collect_tags:
self._fetch_tags(cursor)

datasets = list(self._datasets.values())
tag_datasets(datasets, self._tag_matchers)
Expand Down Expand Up @@ -902,7 +905,9 @@ def _init_dataset(
database=database, schema=schema, table=table
)

dataset.system_tags = SystemTags(tags=[])
# Only initialize this when collect_tags is True
if self._config.collect_tags:
dataset.system_tags = SystemTags(tags=[])

return dataset

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "metaphor-connectors"
version = "0.14.110"
version = "0.14.111"
license = "Apache-2.0"
description = "A collection of Python-based 'connectors' that extract metadata from various sources to ingest into the Metaphor app."
authors = ["Metaphor <[email protected]>"]
Expand Down

0 comments on commit 576d867

Please sign in to comment.