Core: Add file seq number to ManifestEntry#6002
Conversation
| * | ||
| * @param deletedFile a file | ||
| * @deprecated since 1.0.0, will be removed in 1.1.0; use {@link #delete(ContentFile, long)}. | ||
| * @deprecated since 1.1.0, will be removed in 1.2.0; use {@link #delete(ContentFile, long, |
There was a problem hiding this comment.
This did not make it to 1.0, just fixing the comment.
| * @param dataSequenceNumber a data sequence number of the file (assigned when the file was added) | ||
| * @param fileSequenceNumber a file sequence number (assigned when the file was added) | ||
| */ | ||
| public void delete(F deletedFile, long dataSequenceNumber) { |
There was a problem hiding this comment.
This was added recently and hasn't been released yet.
| * @param fileSequenceNumber a file sequence number (assigned when the file was added) | ||
| */ | ||
| public void existing( | ||
| F existingFile, long fileSnapshotId, long dataSequenceNumber, Long fileSequenceNumber) { |
There was a problem hiding this comment.
Note fileSequenceNumber is Long because the value may be null in old v2 manifests.
| validateManifest( | ||
| committedManifest, | ||
| seqs(appendSequenceNumber, appendSequenceNumber), | ||
| seqs(null, null), |
There was a problem hiding this comment.
This test mimics a rewrite of a v2 manifest that did not have file sequence numbers.
| this.sequenceNumber = null; | ||
| return; | ||
| case 3: | ||
| this.fileSequenceNumber = (Long) v; |
There was a problem hiding this comment.
Let me see if it safe to add it here. I think we use positions in RewriteManifestsSparkAction.
There was a problem hiding this comment.
Okay, it seems safe as we use a projection there. That being said, I will need to adapt it in a follow-up to persist the file sequence number as well.
There was a problem hiding this comment.
Should we do it in this PR rather than in a follow up? That seems like an important part and we don't want to release this without it.
There was a problem hiding this comment.
I agree that part must be done before releasing fileSequenceNumber. However, I'd do it in a separate PR immediately after this one as fileSequenceNumber is optional and the current logic does not cause correctness issues. It would be just easier to separate these two.
9edc478 to
9ea3af8
Compare
| * reading a v2 manifest that did not persist the file sequence number for manifest entries with | ||
| * status EXISTING or DELETED (older Iceberg versions). | ||
| */ | ||
| Long fileSequenceNumber(); |
There was a problem hiding this comment.
Do we have any sequence number usage case that cannot be handled by only using fileSequenceNumber? I 'm wondering can we replace dataSequenceNumber with fileSequenceNumber eventually.
There was a problem hiding this comment.
I think we need both as they serve different purposes. The data sequence number will still be used to decide whether a particular delete should apply. The file sequence number, on the other hand, will always tell us the sequence number of the snapshot that added this file. These values can be different during compaction as we may add a file with an old data sequence number.
There was a problem hiding this comment.
Oh, I see. Deletes should not apply to files that have larger dataSequenceNumber but smaller fileSequenceNumber. I guess when compacting data files with deletes may lead to adding existing files with old fileSequenceNumber, compacting v1 table should not have that case.
So we should add dataSequenceNumber to ContentFile since it will be used for planning, right?
There was a problem hiding this comment.
We already use dataSequenceNumber / sequenceNumber in scan planning. We just renamed how it was referenced in core recently. If we choose to expose these on ContentFile that's fine, but something we should do in a separate PR.
There was a problem hiding this comment.
Correct, these sequence numbers would serve different purposes and may be added to ContentFile later.
There was a problem hiding this comment.
IIUC, assuming df1 and df2 have data sequence number 1, once they are compacted to one data file df3. df3's data sequence number will be 1(the old one), and file sequence number is gong to be larger than 1, like 2 or 3. Is that correct?
How do we assign the data data sequence number, if df1 and df2 has different data sequence numbers?
df1 data seq no.: 1
df2 data seq no.: 2
compact df1 and df2 to df3
df3 data seq no.?
There was a problem hiding this comment.
@flyrain, this is something controlled by the following option in the compaction action.
/**
* If the compaction should use the sequence number of the snapshot at compaction start time for
* new data files, instead of using the sequence number of the newly produced snapshot.
*
* <p>This avoids commit conflicts with updates that add newer equality deletes at a higher
* sequence number.
*
* <p>Defaults to true.
*/
String USE_STARTING_SEQUENCE_NUMBER = "use-starting-sequence-number";
By default, df3 will use the sequence number of the snapshot at which the compaction started.
chenjunjiedada
left a comment
There was a problem hiding this comment.
LGTM, just one question.
9ea3af8 to
d9dcd08
Compare
| "File sequence number should default to 0", (Long) 0L, entry.fileSequenceNumber()); | ||
| V2Assert.assertEquals( | ||
| "File sequence number should match", fileSeqs.next(), entry.fileSequenceNumber()); | ||
| } |
There was a problem hiding this comment.
Nit: would be nice to have empty lines before and after the new block.
There was a problem hiding this comment.
I agree but it is not how this entire block is formatted. None of the other if statements are separated by empty lines. I decided to follow the existing formatting for consistency even though I'd surround each independent block by empty lines.
| */ | ||
| public void delete(F deletedFile, long dataSequenceNumber) { | ||
| addEntry(reused.wrapDelete(snapshotId, dataSequenceNumber, deletedFile)); | ||
| public void delete(F deletedFile, long dataSequenceNumber, Long fileSequenceNumber) { |
There was a problem hiding this comment.
It doesn't look like any of these delete methods are used, besides delete(ManifestEntry). Do we need to introduce this new method or can we update the deprecation for delete(ContentFile) to point people to delete(ManifestEntry)?
There was a problem hiding this comment.
The problem is that delete(ManifestEntry) is package-private just like ManifestEntry itself. This, on the other hand, is a public method.
rdblue
left a comment
There was a problem hiding this comment.
Looks correct to me. I had a few comments but merge when you're ready. The main question is whether to fix data file rewrites at the same time, but the big issue is making sure it gets done in the same release, not necessarily in this PR.
| Iceberg v2 adds a sequence number to the entry and makes the snapshot id optional. Both fields, `sequence_number` and `snapshot_id`, are inherited from manifest metadata when `null`. That is, if the field is `null` for an entry, then the entry must inherit its value from the manifest file's metadata, stored in the manifest list [2]. | ||
| The `sequence_number` field represents the data sequence number and must never change after a file is added to the dataset. | ||
| Iceberg v2 adds data and file sequence numbers to the entry and makes the snapshot ID optional. Values for these fields are inherited from manifest metadata when `null`. That is, if the field is `null` for an entry, then the entry must inherit its value from the manifest file's metadata, stored in the manifest list. | ||
| The `sequence_number` field represents the data sequence number and must never change after a file is added to the dataset. The `file_sequence_number` field represents the sequence number of the snapshot that added the file and must also remain unchanged upon assigning at commit. |
There was a problem hiding this comment.
I feel a bit hard to understand the difference between them. Can we add something like?
The `sequence_number` will use the one from original files in the case of compaction, but the `file_sequence_number` will alway be one from the snapshot added the file. `file_sequence_number` of a file may be larger or the same as its `sequence_number`.
There was a problem hiding this comment.
The sequence number in case of compaction would be the compaction snapshot seq number, not necessarily the original sequence number. However, I agree a clarification is needed. I adapted this paragraph to include more details.
d9dcd08 to
928b006
Compare
|
Thanks, @chenjunjiedada @rdblue @flyrain! I'll follow up with a PR to take the new field into account while rewriting manifests in Spark. |
This PR adds
file_sequence_numbertoManifestEntryto keep track of the sequence number at which a file was added.