Skip to content

Core: Avoid generating a large ManifestFile when committing#6335

Merged
aokolnychyi merged 2 commits into
apache:masterfrom
ConeyLiu:files-as-multiple-mf
Jul 28, 2023
Merged

Core: Avoid generating a large ManifestFile when committing#6335
aokolnychyi merged 2 commits into
apache:masterfrom
ConeyLiu:files-as-multiple-mf

Conversation

@ConeyLiu

@ConeyLiu ConeyLiu commented Dec 1, 2022

Copy link
Copy Markdown
Contributor

In our production env, we noticed the manifest files have a large random size, ranging from several KB to larger than 100 MB. It seems the MANIFEST_TARGET_SIZE_BYTES has not worked during the commit phase.

In this PR, we avoid generating a manifest file larger than MANIFEST_TARGET_SIZE_BYTES for newly added content files and will generate multiple manifest files when the size is covered.

@github-actions github-actions Bot added the core label Dec 1, 2022
@ConeyLiu

ConeyLiu commented Dec 1, 2022

Copy link
Copy Markdown
Contributor Author

Hi @szehon-ho @rdblue @pvary @stevenzwu @chenjunjiedada pls help to review this when you are free. Thanks a lot.

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ConeyLiu! just had some questions/comments on this, but this behavior is something I observed as well.

Comment thread core/src/main/java/org/apache/iceberg/SnapshotProducer.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java Outdated
@nastra

nastra commented Dec 5, 2022

Copy link
Copy Markdown
Contributor

It is mentioned in the docs that MANIFEST_TARGET_SIZE_BYTES relates to Target size when merging manifest files, meaning that this setting only takes effect when merging of manifest files happens (e.g. when using newAppend()). Merging of manifest files would not happen when using newFastAppend() for example. This might explain why it seemed that this setting wouldn't take any effect in your env.

@ConeyLiu

ConeyLiu commented Dec 6, 2022

Copy link
Copy Markdown
Contributor Author

@nastra newAppend will create a large manifest as well. It only merges small manifests into a large one, and does not split large one into target file size.

Comment thread core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java Outdated
@rdblue

rdblue commented Dec 18, 2022

Copy link
Copy Markdown
Contributor

This seems like a reasonable thing to add to me. I'm actually more concerned about the use case that caused this.

@ConeyLiu, what was the case where you were creating huge manifests? How many data files were committed?

@ConeyLiu ConeyLiu force-pushed the files-as-multiple-mf branch from 3307397 to 6b21694 Compare December 19, 2022 07:06
@ConeyLiu

Copy link
Copy Markdown
Contributor Author

@rdblue thanks for the reviewing. It happens for the larger table (has several PBs data) and with many columns(thousand columns which is very common for log data or feature data). And the size of the DataFile metrics' is increased with the number of columns. So the manifest file could be really large when the user adds into/overwrites a larger number of data (eg, sync hourly data).

@ConeyLiu ConeyLiu force-pushed the files-as-multiple-mf branch from 6b21694 to 85c05cc Compare December 24, 2022 12:14

@RussellSpitzer RussellSpitzer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two sets of changes here, one around "caching" and one around rolling over manifest files. Can we separate out the changes?

private final List<ManifestFile> appendManifests = Lists.newArrayList();
private final List<ManifestFile> rewrittenAppendManifests = Lists.newArrayList();
private ManifestFile newManifest = null;
private List<ManifestFile> cachedNewManifests = null;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer Do you mean the caching part here? Here just rename the variable because it is plural now.

@ConeyLiu ConeyLiu force-pushed the files-as-multiple-mf branch from 85c05cc to 7ac0847 Compare May 12, 2023 07:03
@ConeyLiu

Copy link
Copy Markdown
Contributor Author

@rdblue @RussellSpitzer @nastra @amogh-jahagirdar Would you mind taking another look at this when you are free? Thanks a lot.

@zinking

zinking commented Jun 1, 2023

Copy link
Copy Markdown
Contributor

@rdblue @RussellSpitzer @szehon-ho can we have another look at this review and merge if possible ? thanks. large manifests have huge performance penalties for PetaByte tables.

@ConeyLiu ConeyLiu force-pushed the files-as-multiple-mf branch from d1e1f1d to 617647e Compare June 2, 2023 09:53
@szehon-ho

Copy link
Copy Markdown
Member

Yea sorry let me try to review this tomorrow.

Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/SnapshotProducer.java Outdated
@ConeyLiu ConeyLiu force-pushed the files-as-multiple-mf branch from 9074d09 to e27ba9b Compare July 7, 2023 15:13
import org.apache.iceberg.relocated.com.google.common.collect.Lists;

/** As opposed to {@link ManifestWriter}, a rolling writer could produce multiple manifest file. */
abstract class RollingManifestWriter<T extends ContentFile<T>> implements FileAppender<T> {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho pls take a look if this is the right direction. Here implements FileAppender instead of FileWriter that's because the ManifestWriter implements FileAppender.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this definitely looks cleaner.

Im just not sure what to do with existing hierarchy (ManifestWriter), as there are many duplications.

Did we think about modifying ManifestWriter, to take extra variable 'targetFileSizeBytes', then trigger tryRollover only in this case.

The problem may be the method manifestFile() in writer? It should have been manifestFiles(). Maybe we can deprecate existing method?

CC @aokolnychyi .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on second read, this may be ok, as its a composition, and not duplicating that much code. Lets see if Anton has any comments

Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
// // Operation succeeds and calls cleanUncommitted'
// txn.newFastAppend().appendFile(...).commit();
// // Some other operations ...
// // Commit fails and needs to clean up newManifests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should this comment be after next one? (seems comments refer to the line above)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the blank line between the comments to separate them.

Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ManifestFiles.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ManifestFiles.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/RollingManifestWriter.java Outdated
import org.apache.iceberg.relocated.com.google.common.collect.Lists;

/** As opposed to {@link ManifestWriter}, a rolling writer could produce multiple manifest file. */
abstract class RollingManifestWriter<T extends ContentFile<T>> implements FileAppender<T> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this definitely looks cleaner.

Im just not sure what to do with existing hierarchy (ManifestWriter), as there are many duplications.

Did we think about modifying ManifestWriter, to take extra variable 'targetFileSizeBytes', then trigger tryRollover only in this case.

The problem may be the method manifestFile() in writer? It should have been manifestFiles(). Maybe we can deprecate existing method?

CC @aokolnychyi .

@aokolnychyi

Copy link
Copy Markdown
Contributor

I will take a look with fresh eyes by the end of tomorrow.

Comment thread core/src/main/java/org/apache/iceberg/ManifestOutputFileFactory.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ManifestFiles.java Outdated
private void tryRollingToNewFile() {
currentFileRows++;

if (currentWriter.length() > targetFileSizeInBytes) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be expensive to call for every file? In rolling file writers, we check only every 1000 records. I understand that value may be too big but I still don't think it is a good idea to check the length for every entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking every 16? Just simply calculated as: 1000 / (512 / 8).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the average cost of an entry in a manifest in some of your real tables? My assumption was something around 250 but let's check the average size of an entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked some tables. The average size has a lot to do with the schema of the table, the more columns take up more storage space. That's because of the column metrics. I noticed the biggest one (with thousand columns) needs about 5KB for one entry. The small tables (about 10 columns) need about several hundred bytes. So 250 should be an acceptable number.

Comment thread core/src/main/java/org/apache/iceberg/RollingManifestWriter.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/RollingManifestWriter.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/FastAppend.java
Comment thread core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java Outdated
@aokolnychyi

Copy link
Copy Markdown
Contributor

I feel like this would be super valuable for CTAS kind of use cases where a large chunk of data is being added at the same time. I had a few suggestions on how to simplify the rolling writer.

Nice work, @ConeyLiu!

@ConeyLiu ConeyLiu force-pushed the files-as-multiple-mf branch from 966a7f8 to d83ad26 Compare July 28, 2023 09:32
private boolean closed = false;

public RollingManifestWriter(
FileIO fileIO,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to delete the empty file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is still needed? We switched to a lazy writer that is initialized only when there is a new entry to write. The same also applies to cases when we roll to a new file. It can only happen if there is a pending entry to write. If I am not missing anything, we can drop FileIO here and simplify closeCurrentWriter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad, the empty file checking is not necessary.

@aokolnychyi aokolnychyi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me. I had one question. @ConeyLiu, could you check if I missed something?

deleteFile(cachedNewDataManifest.path());
this.cachedNewDataManifest = null;
if (cachedNewDataManifests != null) {
List<ManifestFile> committedNewManifests = Lists.newArrayList();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: committedNewManifests -> committedNewDataManifests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

private boolean closed = false;

public RollingManifestWriter(
FileIO fileIO,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is still needed? We switched to a lazy writer that is initialized only when there is a new entry to write. The same also applies to cases when we roll to a new file. It can only happen if there is a pending entry to write. If I am not missing anything, we can drop FileIO here and simplify closeCurrentWriter.

@ConeyLiu

Copy link
Copy Markdown
Contributor Author

@aokolnychyi thanks for the details reviewing. The empty file checking is not necessary. Reverted related code changes.

@aokolnychyi aokolnychyi merged commit a7a09d4 into apache:master Jul 28, 2023
@aokolnychyi

Copy link
Copy Markdown
Contributor

Thanks, @ConeyLiu! Thanks everyone for reviewing!

@ConeyLiu

Copy link
Copy Markdown
Contributor Author

Thanks @aokolnychyi for mering this. Thanks @rdblue @RussellSpitzer @szehon-ho @amogh-jahagirdar for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants