API, Spark 4.0: Expose cleanExpiredMetadata in expire_snapshots Spark procedure#13509
Conversation
5a89bc7 to
2f1dfe0
Compare
| * @param clean remove unused partition specs, schemas, or other metadata when true | ||
| * @return this for method chaining | ||
| */ | ||
| ExpireSnapshots cleanExpiredMetadata(boolean clean); |
There was a problem hiding this comment.
this should throw an UOE so that we don't have to break the API and don't need the RevAPI exception: https://iceberg.apache.org/contribute/#adding-new-functionality-without-breaking-apis
There was a problem hiding this comment.
I see. Thanks for the link! Done
| table.updateSchema().addColumn("extra_col", Types.IntegerType.get()).commit(); | ||
| table.updateSpec().addField("extra_col").commit(); | ||
|
|
||
| final DataFile fileInNewSpec = |
There was a problem hiding this comment.
| final DataFile fileInNewSpec = | |
| DataFile fileInNewSpec = |
|
|
||
| @Override | ||
| public ExpireSnapshotsSparkAction cleanExpiredMetadata(boolean clean) { | ||
| throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
shouldn't be needed once the Interface throws an UOE by default
2f1dfe0 to
a222567
Compare
| private Consumer<String> deleteFunc = null; | ||
| private ExecutorService deleteExecutorService = null; | ||
| private Dataset<FileInfo> expiredFileDS = null; | ||
| private Boolean cleanExpiredMetadata = null; |
There was a problem hiding this comment.
Could we just make this a primitive boolean defaulting to false?
There was a problem hiding this comment.
The reason this is not a primitive is because in jobDesc() I'd like to differentiate between "not provided" and "provided as false" so that I can omit putting this info into the job desc when not provided.
There was a problem hiding this comment.
I think it's fine to always just show the value of cleanExpiredMetadata in jobDesc(), hence I'm +1 on making this a primitive
| if (cleanExpiredMetadata != null) { | ||
| expireSnapshots.cleanExpiredMetadata(cleanExpiredMetadata); | ||
| } |
There was a problem hiding this comment.
If we make it a primitive we can just remove the if and it'll be expireSnapshots.cleanExpiredMetadata(cleanExpiredMetadata);
There was a problem hiding this comment.
see comment above
| assertThat(table.specs().keySet()).doesNotContain(specIdToRemove); | ||
| assertThat(table.schemas().keySet()).doesNotContain(schemaIdToRemove); |
There was a problem hiding this comment.
Nit: Similar to the test above, could we make this a stronger assertion by asserting the exact contents of the specs/schemas rather than it doesn't containing the removed ones.
There was a problem hiding this comment.
Sure. I also added a as() to these assert and I updated the test for the procedure too to be stronger similarly to these ones.
a222567 to
08e3506
Compare
|
Thanks for taking a look @nastra @amogh-jahagirdar ! |
08e3506 to
4629c7d
Compare
4629c7d to
1db5c78
Compare
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks @gaborkaszab, this looks right to me. I'll hold for a bit in case @nastra or any others have any more feedback!
|
thanks for the reviews @amogh-jahagirdar @nastra ! |
No description provided.