Spec: Deprecate Position delete files with row data#14045
Conversation
|
|
||
| ### Position Delete Files with Row Data | ||
|
|
||
| The feature allowing position delete files to include row data, which was introduced in the V2 Iceberg specification, is now deprecated in V3. The Java reference implementation will deprecate this feature in version 1.11.0 and remove it in version 1.12.0, including for V2 tables. Other implementations are encouraged to follow the same deprecation and removal schedule. |
There was a problem hiding this comment.
A bit weird since we never introduce this feature anywhere. Also, the tense used in these sentences, is now deprecated, will deprecate this feature will become invalid over time.
There was a problem hiding this comment.
The feature is there in 1.10.0. Technically available (while it is not used anywhere as far as we know).
1.11.0 is not released yet, so it is not deprecated yet. That is why I have used future tense here.
But I'm open to suggestions, as English is not my mother language.
There was a problem hiding this comment.
I'm not sure why we would mention this here at all
There was a problem hiding this comment.
I'm OK either way but I lean towards mentioning it in the implementation notes (as done here) just because it was technically a capability enabled in the spec and even though we do not know of any writers actually producing the row data, on the offhand that some unknown implementation did produce it and wants to later use the Java library with the expectation to consume that, they can reference this historical context.
In terms of spec wording, I think I'd remove
"Other implementations are encouraged to adopt the same deprecation and removal timeline" because I'm not sure it changes much. If in the slim chance an unknown implementation cares about producing position deletes with row data for whatever reason then they probably also want to consume it (for whatever reason)?
There was a problem hiding this comment.
Here is what I suggest for this section:
"Although the spec allows for including the deleted row itself (in addition to the path and position of the row in the data file) in v2 position delete files, writing the row is optional and no implementation currently writes it. The ability to write and read the row is supported in the Java implementation but is deprecated in version 1.11.0."
In PositionDelete.java, the deprecated methods have the comment "is deprecated in version 1.11.0", so we keep the same tense here. I don't think we need to say "will be removed in [the next release]" in the spec, as that is implied. In code, the method along with the javadoc comment will be removed, but we don't want to have to go back to the spec and update this section to say "... and is removed in version ..." (once it is done).
If other implementations do not even support writing/reading the deleted row, do we even need to say anything about them (in particular, that they should not do so in the future)? This appendix is "Implementation Notes", after all.
|
Since the vote has been passed, I have merged the PR. |
Deprecate Position delete files with row data based on the following discussion thread: https://lists.apache.org/thread/8jw6pb2vq3ghmdqf1yvy8n5n6gg1fq5s
The voting thread: https://lists.apache.org/thread/tfy96bqmz1bmdxr73x17w3xxj3yzs606
The #13870 could be useful when we do the actual deprecation