Skip to content

Core: Prevent empty Puffin file creation in DV writer#13666

Merged
amogh-jahagirdar merged 4 commits into
apache:mainfrom
geruh:pufflin
Jul 25, 2025
Merged

Core: Prevent empty Puffin file creation in DV writer#13666
amogh-jahagirdar merged 4 commits into
apache:mainfrom
geruh:pufflin

Conversation

@geruh

@geruh geruh commented Jul 24, 2025

Copy link
Copy Markdown
Member

The BaseDVFileWriter was creating Puffin files in its close() method, even when no delete operations were performed. Resulting in empty orphaned Puffin files being created and DeleteWriteResult is empty.

It seems by design that the puffin writer always creates an output stream even if no rows are written.

Repro steps

  1. Create MOR V3 Table with and insert some rows i.e.
    INSERT INTO test_table VALUES (10, 'a'), (20, 'b'), (30, 'c');

  2. Delete from table where a positional scan is not null but no rows match i.e.
    DELETE FROM test_table WHERE id = 25;

  3. Observe empty puffin files in data directory not referenced by any manifest.

@github-actions github-actions Bot added the core label Jul 24, 2025

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @geruh for the fix! Just had a minor comment and could we also add a small test to TestDVWriters to sanity check that the result returned for this particular case has empty contents for the DVs, referenced data files, and rewritten deletes. I think we can additionally check the file system to make sure we're not creating additional files (e.g. list the table data location and make sure there's no puffin files)

Comment thread core/src/main/java/org/apache/iceberg/deletes/BaseDVFileWriter.java Outdated

@ebyhr ebyhr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test to TestDVWriters or somewhere?

@github-actions github-actions Bot added the data label Jul 25, 2025
Comment thread data/src/test/java/org/apache/iceberg/io/TestDVWriters.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/io/TestDVWriters.java Outdated

@nastra nastra left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the test has been slightly updated. Thanks for fixing this @geruh

Comment thread data/src/test/java/org/apache/iceberg/io/TestDVWriters.java Outdated
@amogh-jahagirdar

Copy link
Copy Markdown
Contributor

Thanks @geruh , thanks @ebyhr @nastra for reviewing!

@amogh-jahagirdar amogh-jahagirdar merged commit e3763f1 into apache:main Jul 25, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants