Skip to content

Core: Ignore split offsets array when split offset is past file length#8925

Merged
amogh-jahagirdar merged 1 commit into
apache:mainfrom
amogh-jahagirdar:fix-split-offsets-2
Oct 28, 2023
Merged

Core: Ignore split offsets array when split offset is past file length#8925
amogh-jahagirdar merged 1 commit into
apache:mainfrom
amogh-jahagirdar:fix-split-offsets-2

Conversation

@amogh-jahagirdar

@amogh-jahagirdar amogh-jahagirdar commented Oct 26, 2023

Copy link
Copy Markdown
Contributor

Follow up from a miss in the fix done in #8860. There's an additional splitOffsetArray method which gets used here

return ((BaseFile<?>) file).splitOffsetArray();
. We would want to surface null for this array in case we know the split offsets are corrupted and invalid (stemming from an issue in 1.4.0). The split offsets are invalid when the final offset is beyond the end of the file.

Alternatively, we can undo the optimization done here: #8336 (comment) and then the original fix in #8860 would be sufficient, and be simpler. Since the optimization seemed impactful and there was community interest in it, this fix retains it and currently just fixes both cases explicitly.

@rdblue

rdblue commented Oct 26, 2023

Copy link
Copy Markdown
Contributor

@amogh-jahagirdar, maybe this time we should create a test to validate FileScanTask.split when there are bad split offsets? I think that would have caught this in the last PR.

@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-split-offsets-2 branch 5 times, most recently from 114387d to 0ff2819 Compare October 27, 2023 20:47
@amogh-jahagirdar

Copy link
Copy Markdown
Contributor Author

@amogh-jahagirdar, maybe this time we should create a test to validate FileScanTask.split when there are bad split offsets? I think that would have caught this in the last PR.

Done, added a new test which will exercise both splitOffsetArray and splitOffsets. This will also validate the number of tasks per file

Comment thread core/src/test/java/org/apache/iceberg/TestSplitPlanning.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestSplitPlanning.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestSplitPlanning.java Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants