Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Arrow PyCapsule Interface & make pyarrow optional dependency #268

Open
kylebarron opened this issue Jul 22, 2024 · 1 comment
Open
Labels
🦀 rust 🦀 Pull requests that edit Rust code feature request

Comments

@kylebarron
Copy link

kylebarron commented Jul 22, 2024

The Arrow project recently created a new protocol for sharing Arrow data in Python. One of the goals of the protocol is allow exporting / importing Arrow data in Python without having to necessarily use PyArrow as an intermediary.

This allows Arrow-exportable objects to be recognized based on the presence of one of several dunder methods.

A growing number of Python-Arrow libraries are aware of the PyCapsule interface, and then would be able to read from fastexcel directly, without needing to go through pyarrow or even have it installed in the environment.

For example, I have a PR open for polars in pola-rs/polars#17693, but you could also pass the fastexcel object directly into constructors from pyarrow, nanoarrow, arro3. I'm advocating for more projects to adopt the PyCapsule interface directly, including duckdb, datafusion, vegafusion, and daft.

In terms of implementation, currently fastexcel uses arrow-rs' default pyarrow integration. Instead you need to define one or more dunder methods, probably on the ExcelSheet. If you always return a RecordBatch, then you could implement __arrow_c_array__, but if you ever wanted to expose a lazy stream, you could implement __arrow_c_stream__, which would export multiple batches of data.

I have a helper library, pyo3-arrow, that you can use to implement this, separate from arrow-rs for a few reasons. Or the relevant code is pretty small and self contained to vendor if you don't want to add an external dependency.

@lukapeschke lukapeschke added 🦀 rust 🦀 Pull requests that edit Rust code feature request labels Jul 23, 2024
@lukapeschke
Copy link
Collaborator

Thanks for the heads-up, I'll try to look into this when I have the time 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🦀 rust 🦀 Pull requests that edit Rust code feature request
Projects
None yet
Development

No branches or pull requests

2 participants