Skip to content

Core: Load snapshot after it has been committed to prevent accidental cleanup of files#15511

Merged
nastra merged 2 commits into
apache:mainfrom
nastra:load-snapshot-before-cleanup
Mar 16, 2026
Merged

Core: Load snapshot after it has been committed to prevent accidental cleanup of files#15511
nastra merged 2 commits into
apache:mainfrom
nastra:load-snapshot-before-cleanup

Conversation

@nastra

@nastra nastra commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

#10523 added an optimization where the snapshot isn't loaded from the latest table state after it has been committed. After talking to @danielcweeks @amogh-jahagirdar @rdblue we want to be more defensive when cleaning up files and we should be doing the cleanup based on the latest table state.
This means that we should be loading the snapshot after it has been been committed in order to prevent accidental cleanup of files that are based on a snapshot state that isn't necessarily the one that's committed

@github-actions github-actions Bot added the core label Mar 4, 2026
@nastra nastra force-pushed the load-snapshot-before-cleanup branch from f768f5b to 6048f3b Compare March 4, 2026 17:41

// loadTable is executed once
Mockito.verify(adapter)
Mockito.verify(adapter, times(2))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified that this was called 3 times prior to c6b252e and prior to #10523. Given that we're adding back an additional table load this is now being called twice

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current loads are to refresh prior to commit and to refresh after commit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current loads are to refresh prior to commit and to refresh after commit?

yes, exactly

@nastra nastra closed this Mar 4, 2026
@nastra nastra reopened this Mar 4, 2026
}

@TestTemplate
public void manifestNotCleanedUpWhenSnapshotNotLoadableAfterCommit() {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test mainly hits the else branch in SnapshotProducer and thus skips manifest cleanup but I wasn't able to reproduce the theoretical issue that stuff is accidentally cleaned up and causes corrupt metadata so far

taskOps -> {
Snapshot newSnapshot = apply();
stagedSnapshot.set(newSnapshot);
newSnapshotId.set(newSnapshot.snapshotId());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set the snapshot ID here? It should be available everywhere using snapshotId().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really just restoring the behavior to how it was prior to 39373d0 so I didn't do any other modifications to the code other than adding a test

LOG.info("Committed snapshot {} ({})", newSnapshotId.get(), getClass().getSimpleName());

// at this point, the commit must have succeeded. after a refresh, the snapshot is loaded by
// id in case another commit was added between this commit and the refresh.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to mention the case we're fixing: that we may not know which attempt succeded in some cases and we will only clean up the one that actually did.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, added some additional wording around this

}
} else {
// saved may not be present if the latest metadata couldn't be loaded due to eventual
// consistency problems in refresh. in that case, don't clean up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I agree here. If we can't load the latest metadata, it should throw an exception. But anything could happen so unless we know exactly what was committed, it is safer to skip cleanup.

@nastra nastra Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really just restoring the behavior + wording to how it was prior to 39373d0 so I didn't do any other modifications to the code other than adding a test.
The wording itself here was added a few years ago by 5300d27 and the commit details mention that no exception should be thrown

@rdblue rdblue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments on the documentation. This looks correct to me. Thanks, @nastra!

@nastra nastra force-pushed the load-snapshot-before-cleanup branch from 3abb755 to eb9d3df Compare March 16, 2026 09:11
@nastra

nastra commented Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

thanks for reviewing @rdblue

@nastra nastra merged commit aa5894a into apache:main Mar 16, 2026
34 checks passed
@nastra nastra deleted the load-snapshot-before-cleanup branch March 16, 2026 11:23
nastra added a commit to nastra/iceberg that referenced this pull request Mar 16, 2026
nastra added a commit that referenced this pull request Mar 16, 2026
manuzhang pushed a commit to manuzhang/iceberg that referenced this pull request Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants