Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Parquet reader and writer should support BYTE_STREAM_SPLIT encoding #15226

Closed
etseidl opened this issue Mar 4, 2024 · 0 comments · Fixed by #15311
Closed

[FEA] Parquet reader and writer should support BYTE_STREAM_SPLIT encoding #15226

etseidl opened this issue Mar 4, 2024 · 0 comments · Fixed by #15311
Labels
feature request New feature or request

Comments

@etseidl
Copy link
Contributor

etseidl commented Mar 4, 2024

Is your feature request related to a problem? Please describe.
BYTE_STREAM_SPLIT encoding is the only Parquet encoding left that cuDF does not support. Previously limited to use with FLOAT and DOUBLE columns, there is a current proposal (apache/parquet-format#229, and jira) to extend this encoding to all fixed-width data types. When coupled with compression this encoding can provide significant space savings, and its implementation is less CPU intensive than DELTA_BINARY_PACKED encoding.

Describe the solution you'd like
Implement BYTE_STREAM_SPLIT encoding and decoding in cuDF.

Describe alternatives you've considered
Not supporting this encoding.

Additional context
This encoding is a pretty straightforward reimplementation of PLAIN encoding, but it may lead to memory issues due to the need to access input/output buffers with a large stride.

@etseidl etseidl added the feature request New feature or request label Mar 4, 2024
rapids-bot bot pushed a commit that referenced this issue Apr 24, 2024
Closes #15226. Part of #13501.  Adds support for reading and writing `BYTE_STREAM_SPLIT` encoded Parquet data. Includes a "microkernel" version like those introduced by #15159.

Authors:
  - Ed Seidl (https://github.com/etseidl)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Muhammad Haseeb (https://github.com/mhaseeb123)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #15311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants