Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Change Relation Type error between Distributed and non-distributed Materializations #206

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gfunc
Copy link
Contributor

@gfunc gfunc commented Nov 10, 2023

Summary

resolve #205
Changing relation type from non-distributed materialization to distributed materialization or otherwise has confusing error messages and full-refresh is not helpful.

Changes

  1. Added a macro validate_relation_existence to:
    1. raise an error when existing relation's on cluster status are not as expected (e.g. relation first created using table materialization then changed to distributed_table)
    2. drop existing relation when a full-refresh flag is provided
    3. drop existing relation when relation type is changed (view to table or otherwise) since view materialization is defaulted to be created on cluster
  2. Added a test in test_changing_relation_type.py
  3. Debug macro clickhouse__list_relations_without_caching, if the cluster has only one node, set is_on_cluster to true.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG

@genzgd
Copy link
Contributor

genzgd commented Nov 23, 2023

Thanks for the investigation here @gfunc. I think we should limit the effect to distributed models only, since I'm uncomfortable adding a lot of code (and extra SQL queries) to validate the the table exists in the non-distributed case. In theory the state should be totally controlled by dbt. If we take the validate piece out of the main table materialization does a full refresh still create a "clean" state for the non-distributed case?

@gfunc
Copy link
Contributor Author

gfunc commented Nov 29, 2023

Hi @genzgd, Thanks for your comment.
I think the answer is no. To my understanding, the table materialization (not incremental) now is not affected by the full-refresh flag much except for grants. I will validate that shortly.
Also maybe some extra SQLs to handle unexpected existing relations are reasonable for a dbt controlled env? I added only one extra SQL to handle full-refresh, existing relations are loaded from the cache, not affecting performance.

@BentsiLeviav BentsiLeviav self-assigned this Jun 6, 2024
@BentsiLeviav
Copy link
Contributor

Hi @gfunc,

Can we as a start, mitigate the changes only for distributed cases as @genzgd suggested? (and add a feature flag /config for the validation part?)

@gfunc
Copy link
Contributor Author

gfunc commented Jun 11, 2024

Hi @BentsiLeviav,
It could be managed. I could drop those changes in table and incremental materialization.
But IMO the main problem is that we should make full-refresh effective in situations where relation materialization is changed from table to distributed_table and vice-versa.
But the handling of the full-refresh flag is located in materializations, Do you have any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not ADD ON CLUSTER when create __dbt_backup
3 participants