Spec: Add implementation note on current-snapshot-id#12334
Conversation
Co-authored-by: Russell Spitzer <russell.spitzer@GMAIL.COM>
* Add spec notes for snapshot id assignment * Add launage for recommended generation of snapshot id --------- Co-authored-by: Fokko Driesprong <fokko@apache.org>
|
|
||
| The reference implementation uses a type 4 uuid and XORs the 4 most significant bytes with the 4 least significant bytes then ANDs with the maximum long value to arrive at a pseudo-random snapshot id with a low probability of collision. | ||
|
|
||
| The reference Java implementation writes `-1` for "no current snapshot" with V1 and V2 tables and considers this equivalent to omitted or `null`. This has never been formalized in the spec but for compatibility, other implementations can accept `-1` as `null`. Java will no longer write `-1` and will use `null` for "no current snapshot" for all tables with a version greater than or equal to V3. |
There was a problem hiding this comment.
Nit: Still have the confusion as #12334 (comment) , are these two sentence referring to the same implementations? To me, the language suggest they are separate ones ?
There was a problem hiding this comment.
| The reference Java implementation writes `-1` for "no current snapshot" with V1 and V2 tables and considers this equivalent to omitted or `null`. This has never been formalized in the spec but for compatibility, other implementations can accept `-1` as `null`. Java will no longer write `-1` and will use `null` for "no current snapshot" for all tables with a version greater than or equal to V3. | |
| Java writes `-1` for "no current snapshot" with V1 and V2 tables and considers this equivalent to omitted or `null`. This has never been formalized in the spec, but for compatibility, other implementations can accept `-1` as `null`. Java will no longer write `-1` and will use `null` for "no current snapshot" for all tables with a version greater than or equal to V3. |
There was a problem hiding this comment.
@szehon-ho Good point; thanks for iterating on this. What do you think of this form? It doesn't read well to call out The reference Java implementation twice. WDYT?
There was a problem hiding this comment.
Should we just put 'The reference Java implementation' in the first pargraph then the first time we mention it?
The reference Java implementation uses a type 4 uuid and XORs the 4 most significant bytes with the 4 least significant bytes then ANDs with the maximum long value to arrive at a pseudo-random snapshot id with a low probability of collision.
And then can use Java threafter (including the second paragraph)? Then its more clear that both paragraph refer to the same one.
There was a problem hiding this comment.
Yes, I think this aligns with the current state: https://github.com/apache/iceberg/pull/12334/files#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588aR1761-R1763
|
See https://lists.apache.org/thread/54r4nm7qmr4vxhdpwmbx5rntynspskl7 Thanks everyone! |
See for context: https://lists.apache.org/thread/gqqsnww6nqc50pddwn29blzghmb0m0h3