Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-452: Clarify use of RowGroup.ordinal field #453

Merged
merged 2 commits into from
Sep 25, 2024

Conversation

ggershinsky
Copy link
Contributor

Encrypted files use three types of ordinals: row group, column, page. All three are simple local counters in both writers and readers. In addition, the row group ordinal is stored in the parquet footer (RowGroup.ordinal field). Parquet implementors can benefit from a clarification on the reason for and intended use of this field.

@ggershinsky
Copy link
Contributor Author

cc @mapleFU @pitrou

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious:

  1. If multiple files being merged or something, would this being merged with same id, or should this being rewritten?
  2. Is this only required when aad suffix?

Also cc @alamb

@ggershinsky
Copy link
Contributor Author

Just curious:

  1. If multiple files being merged or something, would this being merged with same id, or should this being rewritten?

Each encrypted parquet file has a unique file id , used for signing every module of the file (to ensure they are not swapped, etc). Also, each file typically has a unique encryption key. Therefore, a merged file needs a new id, new row group ordinals, a new key; and re-encryption of each module with the new key / AAD.

  1. Is this only required when aad suffix?

Row group ordinal is a part of the AAD suffix in most modules

@ggershinsky ggershinsky merged commit f4e3042 into apache:master Sep 25, 2024
3 checks passed
@ggershinsky ggershinsky deleted the gh452 branch September 25, 2024 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants