Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(snowflake): only compile sample to TABLESAMPLE on physical tables #10218

Merged
merged 1 commit into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion ibis/backends/sql/compilers/snowflake.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,12 @@ class SnowflakeCompiler(SQLGlotCompiler):
LOWERED_OPS = {
ops.Log2: lower_log2,
ops.Log10: lower_log10,
ops.Sample: lower_sample(),
# Snowflake's TABLESAMPLE _can_ work on subqueries, but only by row and without
# a seed. This is effectively the same as `t.filter(random() <= fraction)`, and
# using TABLESAMPLE here would almost certainly have no benefit over the filter
# version in the optimized physical plan. To avoid a special case just for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what is the special case here?

# snowflake, we only use TABLESAMPLE on physical tables.
ops.Sample: lower_sample(physical_tables_only=True),
}

UNSUPPORTED_OPS = (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ FROM (
FROM "test" AS "t0"
WHERE
"t0"."x" > 10
) AS "t1" TABLESAMPLE system (50.0)
) AS "t1"
WHERE
UNIFORM(TO_DOUBLE(0.0), TO_DOUBLE(1.0), RANDOM()) <= 0.5
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ FROM (
FROM "test" AS "t0"
WHERE
"t0"."x" > 10
) AS "t1" TABLESAMPLE bernoulli (50.0)
) AS "t1"
WHERE
UNIFORM(TO_DOUBLE(0.0), TO_DOUBLE(1.0), RANDOM()) <= 0.5
17 changes: 1 addition & 16 deletions ibis/backends/tests/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2125,22 +2125,7 @@ def test_dynamic_table_slice_with_computed_offset(backend):


@pytest.mark.notimpl(["druid", "risingwave"], raises=com.OperationNotDefinedError)
@pytest.mark.parametrize(
"method",
[
"row",
param(
"block",
marks=[
pytest.mark.notimpl(
["snowflake"],
raises=SnowflakeProgrammingError,
reason="SAMPLE clause on views only supports row wise sampling without seed.",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message was a bit confusing to me. I initially thought it was saying "SAMPLE on subqueries with no seed only works row wise" (implying that if a seed was set, then block wise would work), but actualy it means "SAMPLE on subqueries only works row wise with no seed"

)
],
),
],
)
@pytest.mark.parametrize("method", ["row", "block"])
@pytest.mark.parametrize("subquery", [True, False], ids=["subquery", "table"])
@pytest.mark.xfail_version(pyspark=["sqlglot==25.17.0"])
def test_sample(backend, method, alltypes, subquery):
Expand Down
Loading