Skip to content

Spark: use Bulk deletion for manifests when import files to an iceberg table#13620

Merged
amogh-jahagirdar merged 2 commits into
apache:mainfrom
dramaticlly:bulk-delete-manifests
Jul 22, 2025
Merged

Spark: use Bulk deletion for manifests when import files to an iceberg table#13620
amogh-jahagirdar merged 2 commits into
apache:mainfrom
dramaticlly:bulk-delete-manifests

Conversation

@dramaticlly

Copy link
Copy Markdown
Contributor

Leverage bulk deletion if IO supports, helpful to speed up the table import with many files.

Existing TestAddFilesProcedure have many unit tests cover this code path for both hive and hadoop catalog (hadoop fileIO) in

  • addFilteredPartitionsToPartitionedWithNullValueFilteringOnDept
  • addPartitionsWithNullValueShouldAddFilesToNullPartition
  • addAllPartitionsToPartitionedWithNullValue
  • addFilteredPartitionsToPartitioned2
  • addDataUnpartitioned
  • addPartitionToPartitionedHive
  • testAddFilesToTableWithManySpecs

…g table

Signed-off-by: Hongyue Zhang <hongyue.apache@gmail.com>
@github-actions github-actions Bot added the spark label Jul 21, 2025
Signed-off-by: Hongyue Zhang <hongyue.apache@gmail.com>

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dramaticlly!

@amogh-jahagirdar amogh-jahagirdar merged commit 85cc58a into apache:main Jul 22, 2025
27 checks passed
manirajv06 pushed a commit to manirajv06/iceberg that referenced this pull request Jul 22, 2025
…berg table (apache#13620)

Signed-off-by: Hongyue Zhang <hongyue.apache@gmail.com>

Addressed review comments

Addressed review comments

Addressed review comments

Addressed review comments

Removed testLargeObjectUsingShortStringWithBigHeader(), left testLargeObject() as is and introduced testShortStringsInVariantPrimitives() to cover the short strings stored with 1 byte and 5 byte headers

Removed unnecessary lines

Fixed checkstyle warnings

Fixed unit test failure

Addressed review comments

Addressed review comments
@dramaticlly dramaticlly deleted the bulk-delete-manifests branch July 22, 2025 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants