Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet-cli rewrite option #2981

Open
MyDELearnings opened this issue Aug 7, 2024 · 1 comment
Open

parquet-cli rewrite option #2981

MyDELearnings opened this issue Aug 7, 2024 · 1 comment

Comments

@MyDELearnings
Copy link

Describe the usage question you have. Please include as many useful details as possible.

Hi ,

is it possible to read directly from a gcs bucket to prune a column
like rewrite -i gs:/sourcebbucket/part-00549.parquet -o gs://targetbucket/newdata/dd --prune-columns col4

i am getting error
java.lang.RuntimeException: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"

Component(s)

No response

@wgtmac
Copy link
Member

wgtmac commented Aug 8, 2024

I don't think we can directly use parquet-cli to rewrite files from cloud object store. You may either download them to rewrite locally, or use the ParquetWriter API to set the file system configuration programatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants