Resolve Change Relation Type error between Distributed and non-distributed Materializations #206

gfunc · 2023-11-10T12:40:27Z

Summary

resolve #205
Changing relation type from non-distributed materialization to distributed materialization or otherwise has confusing error messages and full-refresh is not helpful.

Changes

Added a macro validate_relation_existence to:
1. raise an error when existing relation's on cluster status are not as expected (e.g. relation first created using table materialization then changed to distributed_table)
2. drop existing relation when a full-refresh flag is provided
3. drop existing relation when relation type is changed (view to table or otherwise) since view materialization is defaulted to be created on cluster
Added a test in test_changing_relation_type.py
Debug macro clickhouse__list_relations_without_caching, if the cluster has only one node, set is_on_cluster to true.

Checklist

Delete items not relevant to your PR:

Unit and integration tests covering the common scenarios were added
A human-readable description of the changes was provided to include in CHANGELOG

genzgd · 2023-11-23T22:00:45Z

Thanks for the investigation here @gfunc. I think we should limit the effect to distributed models only, since I'm uncomfortable adding a lot of code (and extra SQL queries) to validate the the table exists in the non-distributed case. In theory the state should be totally controlled by dbt. If we take the validate piece out of the main table materialization does a full refresh still create a "clean" state for the non-distributed case?

gfunc · 2023-11-29T03:57:35Z

Hi @genzgd, Thanks for your comment.
I think the answer is no. To my understanding, the table materialization (not incremental) now is not affected by the full-refresh flag much except for grants. I will validate that shortly.
Also maybe some extra SQLs to handle unexpected existing relations are reasonable for a dbt controlled env? I added only one extra SQL to handle full-refresh, existing relations are loaded from the cache, not affecting performance.

BentsiLeviav · 2024-06-09T12:00:15Z

Hi @gfunc,

Can we as a start, mitigate the changes only for distributed cases as @genzgd suggested? (and add a feature flag /config for the validation part?)

gfunc · 2024-06-11T07:12:48Z

Hi @BentsiLeviav,
It could be managed. I could drop those changes in table and incremental materialization.
But IMO the main problem is that we should make full-refresh effective in situations where relation materialization is changed from table to distributed_table and vice-versa.
But the handling of the full-refresh flag is located in materializations, Do you have any suggestions?

gfunc added 2 commits November 10, 2023 11:58

add existing relation validation

981b286

reformat code

342afc2

gfunc mentioned this pull request Nov 10, 2023

Not ADD ON CLUSTER when create __dbt_backup #205

Open

mshustov requested a review from genzgd November 13, 2023 11:58

BentsiLeviav self-assigned this Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve Change Relation Type error between Distributed and non-distributed Materializations #206

Resolve Change Relation Type error between Distributed and non-distributed Materializations #206

gfunc commented Nov 10, 2023 •

edited

Loading

genzgd commented Nov 23, 2023

gfunc commented Nov 29, 2023

BentsiLeviav commented Jun 9, 2024

gfunc commented Jun 11, 2024

Resolve Change Relation Type error between Distributed and non-distributed Materializations #206

Are you sure you want to change the base?

Resolve Change Relation Type error between Distributed and non-distributed Materializations #206

Conversation

gfunc commented Nov 10, 2023 • edited Loading

Summary

Changes

Checklist

genzgd commented Nov 23, 2023

gfunc commented Nov 29, 2023

BentsiLeviav commented Jun 9, 2024

gfunc commented Jun 11, 2024

gfunc commented Nov 10, 2023 •

edited

Loading