Skip to content

Core: Batch load new files when validating replaced partitions#13556

Merged
amogh-jahagirdar merged 4 commits into
apache:mainfrom
gabeiglio:batch-load-new-files
Aug 12, 2025
Merged

Core: Batch load new files when validating replaced partitions#13556
amogh-jahagirdar merged 4 commits into
apache:mainfrom
gabeiglio:batch-load-new-files

Conversation

@gabeiglio

@gabeiglio gabeiglio commented Jul 14, 2025

Copy link
Copy Markdown
Contributor

These changes ensures that data files are not loaded all into memory at once when validating replaced partitions. Instead it uses ParallelIterable to load new files in batches with a hard limit of 30k files in memory at a time.

This PR deprecates SnapshotUtil.newFiles in favor of SnapshotUtil.newFilesBetween

@github-actions github-actions Bot added the core label Jul 14, 2025
Comment thread core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java Outdated
@gabeiglio gabeiglio force-pushed the batch-load-new-files branch from 42f0f83 to 4a0dd13 Compare July 15, 2025 00:31
@gabeiglio gabeiglio force-pushed the batch-load-new-files branch from 4a0dd13 to dcbecc6 Compare July 15, 2025 04:33
Comment thread core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java Outdated
@gabeiglio gabeiglio requested a review from bryanck July 18, 2025 16:38
Comment thread core/src/main/java/org/apache/iceberg/CherryPickOperation.java
Comment thread core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java
Comment thread core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java Outdated

@bryanck bryanck left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gabeiglio , and @bryanck for reviewing!

@amogh-jahagirdar amogh-jahagirdar merged commit 159d253 into apache:main Aug 12, 2025
42 checks passed
fbertsch pushed a commit to fbertsch/iceberg that referenced this pull request Jan 19, 2026
…ons (apache#13556) - 1.4 (apache#824)

Optimize memory utilization by batching new files load when validating a
cherry-pick

Co-authored-by: Gabriel Igliozzi <gaboiglio@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants