Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused library imports, update dbt task #539

Merged
merged 16 commits into from
Nov 18, 2024

Conversation

sydneynotthecity
Copy link
Contributor

@sydneynotthecity sydneynotthecity commented Nov 11, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with the jira ticket associated with the PR.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated the README with the added features, breaking changes, new instructions on how to use the repository.

What

Updating orchestration flows for the dbt marts. Historically, all staging tables and intermediate tables had to be manually tagged in order to execute for a domain pipeline (ie, fee_stats, enriched_history). This required the developer to remember to tag all models appropriately, which led to mistagged and forgotten models. The flow now uses + operator where appropriate to orchestrate pipelines.

I also cleaned up library imports.

Also disabled the scd_snapshot state and soroban workflows as the tables are broken and should not be used.

Why

Staging and intermediate models do not need tagging if node selection is inclusive of upstream models.

Known limitations

[TODO or N/A]

@sydneynotthecity sydneynotthecity requested a review from a team as a code owner November 11, 2024 17:31
@sydneynotthecity
Copy link
Contributor Author

TODO: update dbt images once built for testing

@@ -124,7 +124,7 @@
"partnership_assets__account_holders_activity_fact": false,
"partnership_assets__asset_activity_fact": false
},
"dbt_image_name": "stellar/stellar-dbt:96cd862b1",
"dbt_image_name": "stellar/stellar-dbt-dev:4b8a2ecc4",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will swap once finished testing in test

@@ -35,9 +35,9 @@

# DBT models to run
enriched_history_operations_task = dbt_task(
dag, tag="enriched_history_operations", excluded="singular_test"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singular test is set through a env var

operator="+",
excluded="stellar_dbt_public",
)
trade_agg_task = dbt_task(dag, tag="trade_agg", operator="+")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trade agg is defined in the public project

operator="+",
excluded="stellar_dbt_public",
)
trade_agg_task = dbt_task(dag, tag="trade_agg", operator="+")
fee_stats_agg_task = dbt_task(dag, tag="fee_stats")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not exclude stellar_dbt_public when mart is part of public repo.
When do we not want to put operator='+'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, only when the mart uses only tables built in the enriched_history or current_state data pipelines. ie, if your model only uses enriched_history_operations as a source, has no intermediate tables, etc. This is why I left default operator settings for fee_stats_agg and network_stats_agg

The models cannot have custom intermediate or staging models for this to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. So, we check lineage. If mart is not dependent on intermediate tables / custom sources, we use default operator to avoid race conditions. Otherwise, we use + operator.
That should work. But also, let's continue our discussion in ofc hours for using other ways of model selection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree. This is still imperfect and I think we can come up with an better alternative

Copy link
Contributor

@amishas157 amishas157 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question on the usage of operator. Otherwise looks good to me

@sydneynotthecity sydneynotthecity changed the title Remove unused library imports, update dbt task default Remove unused library imports, update dbt task Nov 18, 2024
@sydneynotthecity sydneynotthecity merged commit 33c0ff8 into master Nov 18, 2024
3 checks passed
@sydneynotthecity sydneynotthecity deleted the patch/update-dbt-node-selection branch November 18, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants