Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: table-key-properties in metadata extra does not populate schema key_properties for tap-bigquery #2660

Open
1 task
ReubenFrankel opened this issue Sep 11, 2024 · 1 comment
Labels

Comments

@ReubenFrankel
Copy link
Contributor

ReubenFrankel commented Sep 11, 2024

Singer SDK Version

0.39.1

Is this a regression?

  • Yes

Python Version

3.9

Bug scope

Taps (catalog, state, etc.)

Operating System

Ubuntu 22.04 LTS

Description

I've been trying to use table-key-properties to configure key properties for a BigQuery stream:

plugins:
  extractors:
  - name: tap-bigquery
    variant: meltanolabs
    pip_url: git+https://github.com/Matatika/tap-bigquery.git
    metadata:
      '*-events_*':
        table-key-properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id

In this case, key_properties is empty so when loading to a database target no primary keys are configured:

Generated catalog
{
  "streams": [
    {
      "tap_stream_id": "analytics_376142012-events_20240825",
      "table_name": "events_20240825",
      "replication_method": "",
      "key_properties": [],
      "schema": ...,
      "is_view": false,
      "stream": "analytics_376142012-events_20240825",
      "metadata": [
        ...,
        {
          "breadcrumb": [],
          "metadata": {
            "inclusion": "available",
            "table-key-properties": [
              "event_timestamp",
              "event_name",
              "event_bundle_sequence_id"
            ],
            "forced-replication-method": "",
            "schema-name": "analytics_376142012",
            "selected": true
          }
        },
        ...
      ],
      "selected": true,
      "table_key_properties": [
        "event_timestamp",
        "event_name",
        "event_bundle_sequence_id"
      ]
    }
  ]
}

Primary keys are configured when using key-properties/key_properties (not documented here):

    metadata:
      '*-events_*':
        key-properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id
Generated catalog
{
  "streams": [
    {
      "tap_stream_id": "analytics_376142012-events_20240825",
      "table_name": "events_20240825",
      "replication_method": "",
      "key_properties": [
        "event_timestamp",
        "event_name",
        "event_bundle_sequence_id"
      ],
      "schema": ...,
      "is_view": false,
      "stream": "analytics_376142012-events_20240825",
      "metadata": [
        ...,
        {
          "breadcrumb": [],
          "metadata": {
            "inclusion": "available",
            "table-key-properties": [],
            "forced-replication-method": "",
            "schema-name": "analytics_376142012",
            "selected": true,
            "key-properties": [
              "event_timestamp",
              "event_name",
              "event_bundle_sequence_id"
            ]
          }
        },
        ...
      ],
      "selected": true
    }
  ]
}
    metadata:
      '*-events_*':
        key_properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id
Generated catalog
{
  "streams": [
    {
      "tap_stream_id": "analytics_376142012-events_20240825",
      "table_name": "events_20240825",
      "replication_method": "",
      "key_properties": [
        "event_timestamp",
        "event_name",
        "event_bundle_sequence_id"
      ],
      "schema": ...,
      "is_view": false,
      "stream": "analytics_376142012-events_20240825",
      "metadata": [
        ...,
        {
          "breadcrumb": [],
          "metadata": {
            "inclusion": "available",
            "table-key-properties": [],
            "forced-replication-method": "",
            "schema-name": "analytics_376142012",
            "selected": true,
            "key_properties": [
              "event_timestamp",
              "event_name",
              "event_bundle_sequence_id"
            ]
          }
        },
        ...
      ],
      "selected": true
    }
  ]
}

Why does this work? What is the correct way to do this? I assume table-key-properties is the intended approach, so I don't know if I'm going to run into some undefined behaviour down the line.


Related:

Code

No response

Link to Slack/Linen

https://meltano.slack.com/archives/C069CQNHDNF/p1725976166433339

@ReubenFrankel
Copy link
Contributor Author

Worth noting that you can also set key properties via stream maps:

    config:
      stream_maps:
        '*-events_*':
          __key_properties__:
          - event_timestamp
          - event_name
          - event_bundle_sequence_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants