Skip to content

Spark: Rewrite V2 deletes to V3 DVs#12250

Merged
danielcweeks merged 1 commit into
apache:mainfrom
nastra:rewrite-position-deletes-to-dvs
Mar 11, 2025
Merged

Spark: Rewrite V2 deletes to V3 DVs#12250
danielcweeks merged 1 commit into
apache:mainfrom
nastra:rewrite-position-deletes-to-dvs

Conversation

@nastra

@nastra nastra commented Feb 13, 2025

Copy link
Copy Markdown
Contributor

No description provided.

@nastra nastra force-pushed the rewrite-position-deletes-to-dvs branch 4 times, most recently from 48f9fb6 to d2910f8 Compare February 13, 2025 11:06
@nastra nastra force-pushed the rewrite-position-deletes-to-dvs branch 3 times, most recently from 95ff7d4 to 4b0819a Compare February 14, 2025 14:56
@nastra nastra force-pushed the rewrite-position-deletes-to-dvs branch 4 times, most recently from ced1341 to a1aac8d Compare February 19, 2025 08:44
@nastra nastra force-pushed the rewrite-position-deletes-to-dvs branch from a1aac8d to 22d14d5 Compare February 26, 2025 16:51
@nastra nastra closed this Feb 26, 2025
@nastra nastra reopened this Feb 26, 2025
StructLike partition,
Map<String, String> writeProperties) {
Map<String, String> writeProperties,
int formatVersion) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we put the formatVersion in the SerializableTable or SerializableMetadataTable? I think the would be more generally useful since we may need that information for other use cases as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have the format version in SerializableTable but in #11620 we concluded that we don't want to support fetching a format version from a SerializableMetadataTable.

There was also some discussion in #11620 (comment) where we basically concluded that it's probably better to let the calling site fetch the underlying table from PositionDeletesTable instead of only supporting the retrieval of a format version from the PositionDeletesTable and not all the other metadata tables in TableUtil.

Also the reason why we're passing the formatVersion here is so that we avoid doing a distributed table metadata load on worker nodes.

Does that make sense or do you think we should still make some changes to properly support fetching the format version from the PositionDeletesTable in TableUtil/SerializableMetadataTable?

@danielcweeks danielcweeks Feb 27, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that if you capture the format version when you create the SerializableMetadataTable like we do with name here and it would be available for all metadata tables (not just positional deletes)

I believe @aokolnychyi had a similar comment and I'm not sure I follow @RussellSpitzer concern (other than we already have multiple places to check to find what format version we're working with).

In general, it would be nice if a table acts like a table and we can get the format version we're working against. This just feels weird because we're updating the methods to pass through info we should have from the table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks I fully agree with everything you said. I think the only caveat is that there isn't a direct 1:1 mapping for each metadata table to its base table in terms of format version. This came already up in #11587 (comment).
For now I've handled the format version only for the PositionDeletesTable and we can re-visit the discussion about how we want to do it for the other metadata tables. Let me know what you think

@nastra nastra force-pushed the rewrite-position-deletes-to-dvs branch 2 times, most recently from 107453d to 98a00d1 Compare February 28, 2025 14:30
Comment thread core/src/main/java/org/apache/iceberg/TableUtil.java Outdated
@nastra nastra force-pushed the rewrite-position-deletes-to-dvs branch from 98a00d1 to 38b49ae Compare March 6, 2025 11:49
@nastra nastra force-pushed the rewrite-position-deletes-to-dvs branch from 38b49ae to 10a2b6a Compare March 7, 2025 06:30

@danielcweeks danielcweeks left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nastra !

@danielcweeks danielcweeks merged commit 4e76f70 into apache:main Mar 11, 2025
@nastra nastra deleted the rewrite-position-deletes-to-dvs branch March 12, 2025 05:14
nastra added a commit to nastra/iceberg that referenced this pull request Mar 21, 2025
nastra added a commit to nastra/iceberg that referenced this pull request Mar 21, 2025
amogh-jahagirdar pushed a commit that referenced this pull request Mar 22, 2025
lliangyu-lin pushed a commit to lliangyu-lin/iceberg that referenced this pull request Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants