Spark: Rewrite V2 deletes to V3 DVs#12250
Conversation
48f9fb6 to
d2910f8
Compare
95ff7d4 to
4b0819a
Compare
ced1341 to
a1aac8d
Compare
a1aac8d to
22d14d5
Compare
| StructLike partition, | ||
| Map<String, String> writeProperties) { | ||
| Map<String, String> writeProperties, | ||
| int formatVersion) { |
There was a problem hiding this comment.
Couldn't we put the formatVersion in the SerializableTable or SerializableMetadataTable? I think the would be more generally useful since we may need that information for other use cases as well.
There was a problem hiding this comment.
we already have the format version in SerializableTable but in #11620 we concluded that we don't want to support fetching a format version from a SerializableMetadataTable.
There was also some discussion in #11620 (comment) where we basically concluded that it's probably better to let the calling site fetch the underlying table from PositionDeletesTable instead of only supporting the retrieval of a format version from the PositionDeletesTable and not all the other metadata tables in TableUtil.
Also the reason why we're passing the formatVersion here is so that we avoid doing a distributed table metadata load on worker nodes.
Does that make sense or do you think we should still make some changes to properly support fetching the format version from the PositionDeletesTable in TableUtil/SerializableMetadataTable?
There was a problem hiding this comment.
I would think that if you capture the format version when you create the SerializableMetadataTable like we do with name here and it would be available for all metadata tables (not just positional deletes)
I believe @aokolnychyi had a similar comment and I'm not sure I follow @RussellSpitzer concern (other than we already have multiple places to check to find what format version we're working with).
In general, it would be nice if a table acts like a table and we can get the format version we're working against. This just feels weird because we're updating the methods to pass through info we should have from the table.
There was a problem hiding this comment.
@danielcweeks I fully agree with everything you said. I think the only caveat is that there isn't a direct 1:1 mapping for each metadata table to its base table in terms of format version. This came already up in #11587 (comment).
For now I've handled the format version only for the PositionDeletesTable and we can re-visit the discussion about how we want to do it for the other metadata tables. Let me know what you think
107453d to
98a00d1
Compare
98a00d1 to
38b49ae
Compare
38b49ae to
10a2b6a
Compare
this backports apache#12250 to Spark 3.4
this backports apache#12250 to Spark 3.4
this backports #12250 to Spark 3.4
this backports apache#12250 to Spark 3.4
No description provided.