Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend support for BYTE_STREAM_SPLIT to FIXED_LEN_BYTE_ARRAY, INT32, and INT64 primitive types #6048

Closed
anjakefala opened this issue Jul 12, 2024 · 6 comments · Fixed by #6159
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@anjakefala
Copy link

anjakefala commented Jul 12, 2024

Please correct me if I'm wrong! It seems arrow-rs has added BYTE_STREAM_SPLIT support for float types, but not for other numerical data types like INT32.

Since then, the Parquet spec has been expanded to extend BYTE_STREAM_SPLIT encoding to other numerical primitive types: apache/parquet-format#229. The C++ PoC is here: apache/arrow#40094.

It would be good for arrow-rs to support BYTE_STREAM_SPLIT encoding additionally for FIXED_LEN_BYTE_ARRAY, INT32, and INT64.

@anjakefala anjakefala added the enhancement Any new improvement worthy of a entry in the changelog label Jul 12, 2024
@etseidl
Copy link
Contributor

etseidl commented Jul 25, 2024

@anjakefala are you working on this? If not I have some free cycles to devote to this.

@anjakefala
Copy link
Author

@etseidl I am not! I would love it if you picked it up. =)

@anjakefala
Copy link
Author

Thank you so much @etseidl and @alamb! =)

@alamb
Copy link
Contributor

alamb commented Aug 6, 2024

Thank you so much @etseidl and @alamb! =)

Also @mapleFU 🙏

@etseidl
Copy link
Contributor

etseidl commented Aug 6, 2024

@anjakefala please let me know if you have performance issues, particularly with FIXED_LEN_BYTE_ARRAY data. That was particularly problematic on the read side. 😦

@alamb alamb added the parquet Changes to the parquet crate label Aug 31, 2024
@alamb
Copy link
Contributor

alamb commented Aug 31, 2024

label_issue.py automatically added labels {'parquet'} from #6159

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
3 participants