Flink: Prevent recreation of ManifestOutputFileFactory during flushing by mxm · Pull Request #14358 · apache/iceberg

mxm · 2025-10-17T08:35:26Z

DynamicWriteResultAggregator uses the ManifestOutputFileFactory class to write a temporary manifest. For the Dynamic Sink we want to support writing to a vast amount of tables, even during a single checkpoint. So we avoid storing all factories and use a cache with an eviction policy.

The problem is that if the factory for a given table is evicted during a checkpoint flush while there could still be writes for that factory being processed. In that case the same output directory will be generated again which leads to overwriting already written manifests files.

We must avoid recreating the output file factory during checkpoint flushing. It is fine to drop the factories due to cache eviction afterwards, as the output paths for factories are scoped by checkpoint id.

DynamicWriteResultAggregator uses the ManifestOutputFileFactory class to write a temporary manifest. For the Dynamic Sink we want to support writing to a vast amount of tables, even during a single checkpoint. So we avoid storing all factories and use a cache with an eviction policy. The problem is that if the factory for a given table is evicted during a checkpoint flush while there could still be writes for that factory being processed. In that case the same output directory will be generated again which leads to overwriting already written manifests files. We must avoid recreating the output file factory during checkpoint flushing. It is fine to drop the factories due to cache eviction afterwards, as the output paths for factories are scoped by checkpoint id.

pvary · 2025-10-17T10:20:59Z

Thanks for the fix @mxm!
Good catch, and good analysis!

…e-creation during flushing

pvary · 2025-10-20T05:55:15Z

@mxm: Do we have the same issue for the writers?

pvary · 2025-10-20T08:04:21Z

Maybe we could add a new parameter to the ManifestOutputFileFactory constructor, like:

  ManifestOutputFileFactory(
      Supplier<Table> tableSupplier,
      Map<String, String> props,
      String flinkJobId,
      String operatorUniqueId,
      int subTaskId,
      long attemptNumber,
      UUID uniqueId) {

and if the UUID is provided then we can add it to the end of the filename, like:

  private String generatePath(long checkpointId) {
    return FileFormat.AVRO.addExtension(
        String.format(
            Locale.ROOT,
            "%s-%s-%05d-%d-%d-%05d-%s",
            flinkJobId,
            operatorUniqueId,
            subTaskId,
            attemptNumber,
            checkpointId,
            fileCount.incrementAndGet(),
            uniqueId));
  }

This would mean that the path generated by the normal sink is not change, but we can change it for every dynamically generated file names.

mxm · 2025-10-20T08:33:40Z

@mxm: Do we have the same issue for the writers?

Yes. The cache for RowDataTaskWriterFactory suffers from the same issue, albeit not as severe due it already using the LRUCache and not a time-based expiration. It uses an underlying OutputFileFactory which also uses an zero-based integer for the file suffix. It can be configured though to use a suffix, which is what we should do.

mxm · 2025-10-20T08:37:07Z

Maybe we could add a new parameter to the ManifestOutputFileFactory constructor, like:
  ManifestOutputFileFactory(
      Supplier<Table> tableSupplier,
      Map<String, String> props,
      String flinkJobId,
      String operatorUniqueId,
      int subTaskId,
      long attemptNumber,
      UUID uniqueId) {
and if the UUID is provided then we can add it to the end of the filename, like:
  private String generatePath(long checkpointId) {
    return FileFormat.AVRO.addExtension(
        String.format(
            Locale.ROOT,
            "%s-%s-%05d-%d-%d-%05d-%s",
            flinkJobId,
            operatorUniqueId,
            subTaskId,
            attemptNumber,
            checkpointId,
            fileCount.incrementAndGet(),
            uniqueId));
  }
This would mean that the path generated by the normal sink is not change, but we can change it for every dynamically generated file names.

Originally, I was a bit hesitant to change the file names, but we agreed on this approach going forward in #14358 (comment). I'll update the PR as suggested by you above to retain the same file names for the non-dynamic IcebergSink.

pvary · 2025-10-20T10:33:40Z

@mxm: Do we have the same issue for the writers?

Yes. The cache for RowDataTaskWriterFactory suffers from the same issue, albeit not as severe due it already using the LRUCache and not a time-based expiration. It uses an underlying OutputFileFactory which also uses an zero-based integer for the file suffix. It can be configured though to use a suffix, which is what we should do.

Do we plan to fix that in another PR, or we fix the DynamicWriter in this PR?

mxm · 2025-10-20T10:40:47Z

I would fix this here. I started drafting a solution for DynamicWriter, but I got stuck while testing. I need a bit more time.

mxm · 2025-10-20T12:02:28Z

I conclude that a fix in DynamicWriter is not required. The reason is that the OutputFileFactory (in Iceberg core), in contrast to ManifestOutputFileFactory, is already scoped via a random uuid for the operation id:

iceberg/core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java

Line 138 in 04e51ba

this.operationId = UUID.randomUUID().toString();

. We call it from here:

iceberg/flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/RowDataTaskWriterFactory.java

Line 171 in 04e51ba

OutputFileFactory.builderFor(table, taskId, attemptId)

I've pushed a test to verify that.

pvary · 2025-10-20T12:22:50Z

I conclude that a fix in DynamicWriter is not required. The reason is that the OutputFileFactory (in Iceberg core), in contrast to ManifestOutputFileFactory, is already scoped via a random uuid for the operation id:

iceberg/core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java

Line 138 in 04e51ba

this.operationId = UUID.randomUUID().toString();

. We call it from here:

iceberg/flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/RowDataTaskWriterFactory.java

Line 171 in 04e51ba

OutputFileFactory.builderFor(table, taskId, attemptId)

I've pushed a test to verify that.

The funny thing is that the unique part there is operationId and the suffix is different (not used).

pvary

+1 pending tests

mxm · 2025-10-20T12:47:28Z

The funny thing is that the unique part there is operationId and the suffix is different (not used).

Yes, I discovered that when I tried using the suffix to insert a UUID, but the file path already contained a UUID which kept changing. I suppose we could replace the operator id from ManifestOutputFileFactory, but that's maybe something for another day.

pvary · 2025-10-20T13:17:39Z

Merged to main.
Thanks for the fix @mxm, for @amogh-jahagirdar for the review and @zncleon for reporting it!

…g flushing (#14385) backports #14358

huaxingao · 2025-11-10T22:42:51Z

@mxm Could you please back-port both this change and #14385 to 1.10.x? Thanks a lot!

apache#14358)

mxm · 2025-11-12T11:38:17Z

@huaxingao Thanks for the reminder. Here it is: #14571

apache#14358)

#14358) (#14571)

apache#14358)

…g flushing (apache#14385) backports apache#14358

apache#14358)

…g flushing (apache#14385) backports apache#14358

github-actions Bot added the flink label Oct 17, 2025

mxm mentioned this pull request Oct 17, 2025

ClassCastException with the DynamicIcebergSink #14251

Closed

3 tasks

pvary reviewed Oct 17, 2025

View reviewed changes

Comment thread .../flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java

pvary reviewed Oct 17, 2025

View reviewed changes

Comment thread .../flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java Outdated

pvary reviewed Oct 17, 2025

View reviewed changes

Comment thread flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/LRUCache.java Outdated

pvary added this to the Iceberg 1.10.1 milestone Oct 17, 2025

Change implementation to use a UUID for file names to allow factory r…

d0ceaf7

…e-creation during flushing

github-actions Bot added the build label Oct 17, 2025

pvary reviewed Oct 17, 2025

View reviewed changes

Comment thread .../flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java Outdated

fixup! Use super type

586f90b

Preserve old file names for non-dynamic sink

04e51ba

Verify that DynamicWriter does not suffer from the same issue

3f01fcb

pvary approved these changes Oct 20, 2025

View reviewed changes

pvary merged commit 911a486 into apache:main Oct 20, 2025
18 checks passed

mxm deleted the deltamanifest-file-scoping branch October 21, 2025 08:47

mxm added a commit to mxm/iceberg that referenced this pull request Oct 21, 2025

Flink: Backport apache#14358 to Flink 2.0

4b47dbc

mxm added a commit to mxm/iceberg that referenced this pull request Oct 21, 2025

Flink: Backport apache#14358 to Flink 1.20

a476cd4

mxm mentioned this pull request Oct 21, 2025

Flink: Backport #14358: Prevent recreation of ManifestOutputFileFactory during flushing #14385

Merged

pvary pushed a commit that referenced this pull request Oct 21, 2025

Flink: Backport Prevent recreation of ManifestOutputFileFactory durin…

fa4890e

…g flushing (#14385) backports #14358

mxm added a commit to mxm/iceberg that referenced this pull request Nov 12, 2025

Flink: Prevent recreation of ManifestOutputFileFactory during flushing (

f29b7f8

apache#14358)

mxm mentioned this pull request Nov 12, 2025

[1.10.x] Backport #14358: Flink: Prevent recreation of ManifestOutputFileFactory during flushing #14571

Merged

mxm added a commit to mxm/iceberg that referenced this pull request Nov 12, 2025

Flink: Prevent recreation of ManifestOutputFileFactory during flushing (

482cfdd

apache#14358)

huaxingao pushed a commit that referenced this pull request Nov 12, 2025

Flink: Prevent recreation of ManifestOutputFileFactory during flushing (

c6b71ef

#14358) (#14571)

thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026

Flink: Prevent recreation of ManifestOutputFileFactory during flushing (

469b7fa

apache#14358)

thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026

Flink: Backport Prevent recreation of ManifestOutputFileFactory durin…

9df2858

…g flushing (apache#14385) backports apache#14358

talatuyarer pushed a commit to talatuyarer/iceberg that referenced this pull request Apr 1, 2026

Flink: Prevent recreation of ManifestOutputFileFactory during flushing (

cb12fad

apache#14358)

talatuyarer pushed a commit to talatuyarer/iceberg that referenced this pull request Apr 1, 2026

Flink: Backport Prevent recreation of ManifestOutputFileFactory durin…

d3257fe

…g flushing (apache#14385) backports apache#14358

stevenzwu mentioned this pull request May 19, 2026

Docs: Add release notes for 1.11.0 #16431

Merged

Uh oh!

Conversation

mxm commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pvary commented Oct 17, 2025

Uh oh!

Uh oh!

pvary commented Oct 20, 2025

Uh oh!

pvary commented Oct 20, 2025

Uh oh!

mxm commented Oct 20, 2025

Uh oh!

mxm commented Oct 20, 2025

Uh oh!

pvary commented Oct 20, 2025

Uh oh!

mxm commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxm commented Oct 20, 2025

Uh oh!

pvary commented Oct 20, 2025

Uh oh!

pvary left a comment

Choose a reason for hiding this comment

Uh oh!

mxm commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pvary commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huaxingao commented Nov 10, 2025

Uh oh!

mxm commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mxm commented Oct 20, 2025 •

edited

Loading

mxm commented Oct 20, 2025 •

edited

Loading

pvary commented Oct 20, 2025 •

edited

Loading