Skip to content

Spark: Fix stats in rewrite metadata action#5691

Merged
Fokko merged 3 commits into
apache:masterfrom
rdblue:fix-spark-rewrite-action
Sep 2, 2022
Merged

Spark: Fix stats in rewrite metadata action#5691
Fokko merged 3 commits into
apache:masterfrom
rdblue:fix-spark-rewrite-action

Conversation

@rdblue

@rdblue rdblue commented Sep 1, 2022

Copy link
Copy Markdown
Contributor

This is an update to #5665. There was discussion on that PR about how to project the partition tuple to the correct final type, and this PR implements the update by passing the desired data file type into StructDataFile. I think this is the cleanest way to fix the problem.

Because StructDataFile does not implement StructLike and is instead passed into a ManifestWriter that uses the DataFile API, the projection can't happen in writeManifest. Inside StructDataFile, only the partition tuple's type can differ, so only that field needs to be projected.

@github-actions github-actions Bot added the spark label Sep 1, 2022
@rdblue

rdblue commented Sep 1, 2022

Copy link
Copy Markdown
Contributor Author

@Fokko, I took a look at #5665 and got it working. I'm opening this so that you have the option to either cherry-pick my changes into your PR or just go ahead and merge this.

@Fokko Fokko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, much cleaner than my solution. Thanks for picking this up @rdblue

@rdblue

rdblue commented Sep 2, 2022

Copy link
Copy Markdown
Contributor Author

Thanks for the review, @Fokko! And nice work finding the problem.

@rdblue rdblue added this to the Iceberg 0.14.1 Release milestone Sep 2, 2022
rdblue added a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
rdblue added a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
rdblue added a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
rdblue added a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
rdblue added a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
rdblue added a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
rdblue added a commit that referenced this pull request Sep 3, 2022
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 10, 2023
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants