Skip to content

API, Spark 4.0: Expose cleanExpiredMetadata in expire_snapshots Spark procedure#13509

Merged
nastra merged 1 commit into
apache:mainfrom
gaborkaszab:expose_cleanExpiredMetadata
Jul 11, 2025
Merged

API, Spark 4.0: Expose cleanExpiredMetadata in expire_snapshots Spark procedure#13509
nastra merged 1 commit into
apache:mainfrom
gaborkaszab:expose_cleanExpiredMetadata

Conversation

@gaborkaszab

Copy link
Copy Markdown
Contributor

No description provided.

* @param clean remove unused partition specs, schemas, or other metadata when true
* @return this for method chaining
*/
ExpireSnapshots cleanExpiredMetadata(boolean clean);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should throw an UOE so that we don't have to break the API and don't need the RevAPI exception: https://iceberg.apache.org/contribute/#adding-new-functionality-without-breaking-apis

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the link! Done

table.updateSchema().addColumn("extra_col", Types.IntegerType.get()).commit();
table.updateSpec().addField("extra_col").commit();

final DataFile fileInNewSpec =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final DataFile fileInNewSpec =
DataFile fileInNewSpec =

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@Override
public ExpireSnapshotsSparkAction cleanExpiredMetadata(boolean clean) {
throw new UnsupportedOperationException();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be needed once the Interface throws an UOE by default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private Consumer<String> deleteFunc = null;
private ExecutorService deleteExecutorService = null;
private Dataset<FileInfo> expiredFileDS = null;
private Boolean cleanExpiredMetadata = null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just make this a primitive boolean defaulting to false?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this is not a primitive is because in jobDesc() I'd like to differentiate between "not provided" and "provided as false" so that I can omit putting this info into the job desc when not provided.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to always just show the value of cleanExpiredMetadata in jobDesc(), hence I'm +1 on making this a primitive

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

Comment on lines +168 to +170
if (cleanExpiredMetadata != null) {
expireSnapshots.cleanExpiredMetadata(cleanExpiredMetadata);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make it a primitive we can just remove the if and it'll be expireSnapshots.cleanExpiredMetadata(cleanExpiredMetadata);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

Comment on lines +1399 to +1400
assertThat(table.specs().keySet()).doesNotContain(specIdToRemove);
assertThat(table.schemas().keySet()).doesNotContain(schemaIdToRemove);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Similar to the test above, could we make this a stronger assertion by asserting the exact contents of the specs/schemas rather than it doesn't containing the removed ones.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I also added a as() to these assert and I updated the test for the procedure too to be stronger similarly to these ones.

@gaborkaszab gaborkaszab force-pushed the expose_cleanExpiredMetadata branch from a222567 to 08e3506 Compare July 11, 2025 07:46
@gaborkaszab

Copy link
Copy Markdown
Contributor Author

Thanks for taking a look @nastra @amogh-jahagirdar !

@gaborkaszab gaborkaszab force-pushed the expose_cleanExpiredMetadata branch from 08e3506 to 4629c7d Compare July 11, 2025 08:00
@gaborkaszab gaborkaszab force-pushed the expose_cleanExpiredMetadata branch from 4629c7d to 1db5c78 Compare July 11, 2025 10:29
@amogh-jahagirdar amogh-jahagirdar changed the title API, Spark: Expose cleanExpiredMetadata in expire_snapshots Spark procedure API, Spark 4.0: Expose cleanExpiredMetadata in expire_snapshots Spark procedure Jul 11, 2025

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gaborkaszab, this looks right to me. I'll hold for a bit in case @nastra or any others have any more feedback!

@nastra nastra merged commit 5bda66b into apache:main Jul 11, 2025
43 checks passed
@gaborkaszab

Copy link
Copy Markdown
Contributor Author

thanks for the reviews @amogh-jahagirdar @nastra !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants