Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chdb): implement chdb - the embedded clickhouse - backend #8497

Closed
wants to merge 11 commits into from

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Feb 29, 2024

A big chunk of the backends testing suite is already passing, but there is still work to do:

  1. Support passing in memory tables, this can be done using the same "pass as a file" trick used in chdb.dataframe
  2. Better error handling
  3. Pass the right dialect everywhere

Test suite is currently at 247 failed, 1067 passed, 28 skipped, 29854 deselected, 131 xfailed, 6 errors


"""
with contextlib.suppress(AttributeError):
query = query.sql(dialect="clickhouse", pretty=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
query = query.sql(dialect="clickhouse", pretty=True)
query = query.sql(dialect=self.dialect, pretty=True)

Comment on lines 97 to 98
names = table.column("name").to_pylist()
types = table.column("type").to_pylist()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
names = table.column("name").to_pylist()
types = table.column("type").to_pylist()
names = table["name"].to_pylist()
types = table["type"].to_pylist()

if external_tables:
raise NotImplementedError("External tables are not yet supported")

df = self.con.query(sql, "arrowtable").to_pandas()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df = self.con.query(sql, "arrowtable").to_pandas()
df = self.con.query(sql, fmt="arrowtable").to_pandas()

if external_tables:
raise NotImplementedError("External tables are not yet supported")

table = self.con.query(sql, "arrowtable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table = self.con.query(sql, "arrowtable")
table = self.con.query(sql, fmt="arrowtable")

Comment on lines 34 to 35
# [(value,)] = self.connection.con.query("SELECT true").result_set
# return isinstance(value, bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# [(value,)] = self.connection.con.query("SELECT true").result_set
# return isinstance(value, bool)

We definitely will not be supporting versions of chdb that don't support native booleans.

def ddl_script(self) -> Iterable[str]:
parquet_dir = self.data_dir / "parquet"
for sql in super().ddl_script:
yield sql.format(parquet_dir=parquet_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not introduce templating if we can avoid it. How about hard coding the path to ci/ibis-testing-data/... like we do with everything else?

"""
con = self.connection.con

con.query(f"CREATE DATABASE {database} ENGINE = Atomic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with our process-based parallel testing setup (i.e., pytest-xdist)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not there yet :)

@@ -175,6 +175,7 @@ class Options(Config):
default_backend: Optional[Any] = None
context_adjustment: ContextAdjustment = ContextAdjustment()
sql: SQL = SQL()
chdb: Optional[Config] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this?

@kszucs
Copy link
Member Author

kszucs commented May 15, 2024

I won't have time to keep working on this in the near future, so closing it.

@kszucs kszucs closed this May 15, 2024
@deepyaman deepyaman reopened this Jul 8, 2024
@zhenzhongxu
Copy link
Contributor

@auxten we've reopened the PR based on your feedback.
@kszucs, we'd like to move the Ibis/chDB integration forward. I'm wondering if you have any useful context to whoever will pick this up.

@jcrist
Copy link
Member

jcrist commented Aug 12, 2024

Going to reclose this for now. Can always reopen if we want to continue the work, but no need to keep it open if we're not hacking on it proactively.

@jcrist jcrist closed this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants