Skip to content

Core: PartitionsTable#partitions returns incomplete list in case of partition evolution and NULL partition values#12528

Merged
pvary merged 5 commits into
apache:mainfrom
deniskuzZ:null_part_value
Jun 18, 2025
Merged

Core: PartitionsTable#partitions returns incomplete list in case of partition evolution and NULL partition values#12528
pvary merged 5 commits into
apache:mainfrom
deniskuzZ:null_part_value

Conversation

@deniskuzZ

@deniskuzZ deniskuzZ commented Mar 14, 2025

Copy link
Copy Markdown
Member

BugFix #12529

return 0;
}

if (o1 instanceof StructProjection && o2 instanceof StructProjection) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big behavior change and I'm not sure it's the right place to fix this issue. The previous behavior created a comparator which only cared about those fields which are passed in the Constructor, this changes that to instead look at the raw lengths of the projections instead.

Can you explain a little more about the bug you are trying to fix and write some test cases which are currently failing to demonstrate the issue?

@deniskuzZ deniskuzZ Mar 14, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to compare the number of projected fields.

The issue is that 2 projections with null and missing field values are considered equal. That leads to incorrect PartitionMap construction:

Below PartitionProjections are equal, so in PartitionMap we'll have just 1 random entry:

"ice_orc(company_id=null)"
"ice_orc(company_id=null/dept_id=null)"
"ice_orc(company_id=null/dept_id=null/team_id=null)"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test: TestMetadataTableScansWithPartitionEvolution#testPartitionSpecEvolutionNullValues

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to break other things, because most other places in the code we do want those partitions to be equal

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see, but right now that leads to invalid partition list evaluation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's true, but that we should fix it in the partition list evaluation and not in the base equality method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer as per your recommendation, created a localized fix for PartitionsTable. Please take a look.

@github-actions github-actions Bot added core and removed API labels Mar 14, 2025
@github-actions github-actions Bot added the spark label Mar 15, 2025
@deniskuzZ deniskuzZ changed the title Core: PartitionsTable#partitions returns incomplete list in case of partition evolution and null partition values Core: PartitionsTable#partitions returns incomplete list in case of partition rename or partition evolution with null partition values Mar 15, 2025
@deniskuzZ deniskuzZ changed the title Core: PartitionsTable#partitions returns incomplete list in case of partition rename or partition evolution with null partition values Core: PartitionsTable#partitions returns incomplete list in case of partition rename or evolution with NULL partition values Mar 15, 2025
@deniskuzZ deniskuzZ changed the title Core: PartitionsTable#partitions returns incomplete list in case of partition rename or evolution with NULL partition values Core: PartitionsTable#partitions returns incomplete list in case of partition evolution and NULL partition values Mar 16, 2025
@github-actions github-actions Bot added the API label Mar 16, 2025
@deniskuzZ deniskuzZ force-pushed the null_part_value branch 4 times, most recently from 7ffc458 to c7ae2bd Compare March 17, 2025 12:50
.newFastAppend()
.appendFile(
newDataFile(
"company_id=__HIVE_DEFAULT_PARTITION__"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to do this with partition paths instead of structs? I generally would like to avoid using strings if at all possible

@deniskuzZ deniskuzZ Mar 17, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to use struct

@RussellSpitzer

Copy link
Copy Markdown
Member

@szehon-ho Adding you since you were in this code most recently

@deniskuzZ

Copy link
Copy Markdown
Member Author

hi @szehon-ho, could you please take a look? thank you!

@deniskuzZ

Copy link
Copy Markdown
Member Author

hi @RussellSpitzer, do you know if anyone else could assist with the review?

@github-actions

github-actions Bot commented May 1, 2025

Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label May 1, 2025
@deniskuzZ

Copy link
Copy Markdown
Member Author

hi @pvary, could you please help move this forward? Thank you!

@github-actions github-actions Bot removed the stale label May 7, 2025
Comment thread core/src/main/java/org/apache/iceberg/PartitionsTable.java
Comment thread core/src/test/java/org/apache/iceberg/TestTables.java
Comment thread core/src/main/java/org/apache/iceberg/PartitionsTable.java
Comment thread core/src/main/java/org/apache/iceberg/PartitionsTable.java
Comment thread core/src/main/java/org/apache/iceberg/util/StructLikeMap.java
Comment thread core/src/test/java/org/apache/iceberg/TestTables.java Outdated

@pvary pvary left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine from my side.
Left some formatting request, but otherwise looks good to me

@pvary pvary merged commit fa162de into apache:main Jun 18, 2025
43 checks passed
@pvary

pvary commented Jun 18, 2025

Copy link
Copy Markdown
Contributor

Since there were no new comments in the last 2 weeks, I have merged the change.
Thanks @deniskuzZ for the PR and @RussellSpitzer for the review!

@deniskuzZ

Copy link
Copy Markdown
Member Author

Thank you, @pvary and @RussellSpitzer!

devendra-nr pushed a commit to devendra-nr/iceberg that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants