Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write truncated parquet footer #3069

Open
Zand100 opened this issue Nov 18, 2024 · 0 comments
Open

Write truncated parquet footer #3069

Zand100 opened this issue Nov 18, 2024 · 0 comments

Comments

@Zand100
Copy link

Zand100 commented Nov 18, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Sometimes a file is written that is missing the last byte, so it ends in .PAR when it should be .PAR1. This causes EOFException when attempting to read the file.

$ hexdump -C good.snappy.parquet| tail -n 10
004fff70  6b 2e 6c 65 67 61 63 79  44 61 74 65 54 69 6d 65  |k.legacyDateTime|
004fff80  18 00 00 18 4a 70 61 72  71 75 65 74 2d 6d 72 20  |....Jparquet-mr |
004fff90  76 65 72 73 69 6f 6e 20  31 2e 31 32 2e 33 20 28  |version 1.12.3 (|
004fffa0  62 75 69 6c 64 20 66 38  64 63 65 64 31 38 32 63  |build f8dced182c|
004fffb0  34 63 31 66 62 64 65 63  36 63 63 62 33 31 38 35  |4c1fbdec6ccb3185|
004fffc0  35 33 37 62 35 61 30 31  65 36 65 64 36 62 29 19  |537b5a01e6ed6b).|
004fffd0  dc 1c 00 00 1c 00 00 1c  00 00 1c 00 00 1c 00 00  |................|
004fffe0  1c 00 00 1c 00 00 1c 00  00 1c 00 00 1c 00 00 1c  |................|
004ffff0  00 00 1c 00 00 1c 00 00  00 e7 0f 00 00 50 41 52  |.............PAR|
00500000

This might be related - we are seeing this issue only on GCP, not AWS. For GCP we do disk seeks randomly and on AWS we do disk seeks sequentially.

We can rerun a job that writes the corrupt parquet file, and it will succeed the second time, so it seems to be nondeterministic.

This is on version 1.14.3.

Component(s)

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant