-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unused library imports, update dbt task #539
Conversation
TODO: update dbt images once built for testing |
…stellar/stellar-etl-airflow into patch/update-dbt-node-selection
airflow_variables_dev.json
Outdated
@@ -124,7 +124,7 @@ | |||
"partnership_assets__account_holders_activity_fact": false, | |||
"partnership_assets__asset_activity_fact": false | |||
}, | |||
"dbt_image_name": "stellar/stellar-dbt:96cd862b1", | |||
"dbt_image_name": "stellar/stellar-dbt-dev:4b8a2ecc4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will swap once finished testing in test
@@ -35,9 +35,9 @@ | |||
|
|||
# DBT models to run | |||
enriched_history_operations_task = dbt_task( | |||
dag, tag="enriched_history_operations", excluded="singular_test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
singular test is set through a env var
operator="+", | ||
excluded="stellar_dbt_public", | ||
) | ||
trade_agg_task = dbt_task(dag, tag="trade_agg", operator="+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trade agg is defined in the public project
operator="+", | ||
excluded="stellar_dbt_public", | ||
) | ||
trade_agg_task = dbt_task(dag, tag="trade_agg", operator="+") | ||
fee_stats_agg_task = dbt_task(dag, tag="fee_stats") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not exclude stellar_dbt_public when mart is part of public repo.
When do we not want to put operator='+'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, only when the mart uses only tables built in the enriched_history
or current_state
data pipelines. ie, if your model only uses enriched_history_operations
as a source, has no intermediate tables, etc. This is why I left default operator settings for fee_stats_agg
and network_stats_agg
The models cannot have custom intermediate or staging models for this to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. So, we check lineage. If mart is not dependent on intermediate tables / custom sources, we use default operator to avoid race conditions. Otherwise, we use +
operator.
That should work. But also, let's continue our discussion in ofc hours for using other ways of model selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree. This is still imperfect and I think we can come up with an better alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question on the usage of operator. Otherwise looks good to me
PR Checklist
PR Structure
otherwise).
Thoroughness
What
Updating orchestration flows for the dbt marts. Historically, all staging tables and intermediate tables had to be manually tagged in order to execute for a domain pipeline (ie, fee_stats, enriched_history). This required the developer to remember to tag all models appropriately, which led to mistagged and forgotten models. The flow now uses
+
operator where appropriate to orchestrate pipelines.I also cleaned up library imports.
Also disabled the scd_snapshot state and soroban workflows as the tables are broken and should not be used.
Why
Staging and intermediate models do not need tagging if node selection is inclusive of upstream models.
Known limitations
[TODO or N/A]