Skip to content

Core, Parquet, ORC: Don't write column sizes when metrics mode is None#10440

Merged
amogh-jahagirdar merged 1 commit into
apache:mainfrom
amogh-jahagirdar:parquet-metrics-fix
Jun 5, 2024
Merged

Core, Parquet, ORC: Don't write column sizes when metrics mode is None#10440
amogh-jahagirdar merged 1 commit into
apache:mainfrom
amogh-jahagirdar:parquet-metrics-fix

Conversation

@amogh-jahagirdar

@amogh-jahagirdar amogh-jahagirdar commented Jun 4, 2024

Copy link
Copy Markdown
Contributor

Currently, the Iceberg Parquet and ORC writers will write out column sizes even when metrics are disabled. This should not be the case since column sizes are optional in the spec and we should respect the property for this case.

@amogh-jahagirdar

Copy link
Copy Markdown
Contributor Author

Seems like ORC and possibly Avro writers also need to be updated, the ORC tests are failing

@github-actions github-actions Bot added the ORC label Jun 4, 2024
@amogh-jahagirdar amogh-jahagirdar changed the title Core, Parquet: Don't write column sizes when metrics mode is None Core, Parquet, ORC: Don't write column sizes when metrics mode is None Jun 4, 2024
@amogh-jahagirdar

Copy link
Copy Markdown
Contributor Author

Avro didn't need any changes which makes sense it's row oriented anyways. Fixed ORC.

@szehon-ho szehon-ho left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking along these lines and so makes sense to me. Nothing seems relying on this field?

@amogh-jahagirdar

amogh-jahagirdar commented Jun 5, 2024

Copy link
Copy Markdown
Contributor Author

@szehon-ho Yeah at least from my analysis there's no expectation on this being populated in the library (which makes sense it is optional as per the spec).
Thanks for the review!

@amogh-jahagirdar

Copy link
Copy Markdown
Contributor Author

Thanks for the reviews @szehon-ho @nastra @danielcweeks !

@amogh-jahagirdar amogh-jahagirdar merged commit be46d29 into apache:main Jun 5, 2024
jasonf20 pushed a commit to jasonf20/iceberg that referenced this pull request Aug 4, 2024
sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants