Core: PartitionsTable#partitions returns incomplete list in case of partition evolution and NULL partition values#12528
Conversation
d310f23 to
0ec9faf
Compare
| return 0; | ||
| } | ||
|
|
||
| if (o1 instanceof StructProjection && o2 instanceof StructProjection) { |
There was a problem hiding this comment.
This is a pretty big behavior change and I'm not sure it's the right place to fix this issue. The previous behavior created a comparator which only cared about those fields which are passed in the Constructor, this changes that to instead look at the raw lengths of the projections instead.
Can you explain a little more about the bug you are trying to fix and write some test cases which are currently failing to demonstrate the issue?
There was a problem hiding this comment.
Updated the PR to compare the number of projected fields.
The issue is that 2 projections with null and missing field values are considered equal. That leads to incorrect PartitionMap construction:
Below PartitionProjections are equal, so in PartitionMap we'll have just 1 random entry:
"ice_orc(company_id=null)"
"ice_orc(company_id=null/dept_id=null)"
"ice_orc(company_id=null/dept_id=null/team_id=null)"
There was a problem hiding this comment.
added test: TestMetadataTableScansWithPartitionEvolution#testPartitionSpecEvolutionNullValues
There was a problem hiding this comment.
I think this is going to break other things, because most other places in the code we do want those partitions to be equal
There was a problem hiding this comment.
let's see, but right now that leads to invalid partition list evaluation
There was a problem hiding this comment.
I think that's true, but that we should fix it in the partition list evaluation and not in the base equality method.
There was a problem hiding this comment.
@RussellSpitzer as per your recommendation, created a localized fix for PartitionsTable. Please take a look.
0ec9faf to
6f9be44
Compare
5986d67 to
30ae2a4
Compare
30ae2a4 to
7bb81ce
Compare
b03dd4a to
5157d71
Compare
7ffc458 to
c7ae2bd
Compare
| .newFastAppend() | ||
| .appendFile( | ||
| newDataFile( | ||
| "company_id=__HIVE_DEFAULT_PARTITION__" |
There was a problem hiding this comment.
Is there a reason to do this with partition paths instead of structs? I generally would like to avoid using strings if at all possible
There was a problem hiding this comment.
no reason, I followed the style used in this test class:
https://github.com/apache/iceberg/pull/12528/files#diff-31a7139557d5a20af8245ddf3f9a4ff125094cd5d4f50b19e08104fecbabf98dR64-R66
There was a problem hiding this comment.
refactored to use struct
|
@szehon-ho Adding you since you were in this code most recently |
ad521fe to
45c48b2
Compare
|
hi @szehon-ho, could you please take a look? thank you! |
|
hi @RussellSpitzer, do you know if anyone else could assist with the review? |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
hi @pvary, could you please help move this forward? Thank you! |
pvary
left a comment
There was a problem hiding this comment.
Fine from my side.
Left some formatting request, but otherwise looks good to me
…artition evolution and null partition values
45c48b2 to
43acfdd
Compare
|
Since there were no new comments in the last 2 weeks, I have merged the change. |
|
Thank you, @pvary and @RussellSpitzer! |
…artition evolution and NULL partition values (apache#12528)
BugFix #12529