Skip to content

Spark 4.0: Add schema conversion support for default values#14407

Merged
amogh-jahagirdar merged 7 commits into
apache:mainfrom
geruh:default-write
Oct 29, 2025
Merged

Spark 4.0: Add schema conversion support for default values#14407
amogh-jahagirdar merged 7 commits into
apache:mainfrom
geruh:default-write

Conversation

@geruh

@geruh geruh commented Oct 23, 2025

Copy link
Copy Markdown
Member

This PR adds support for default values in Spark. During the conversion of an Iceberg schema to Spark's StructType, default values are now passed through to Spark's column metadata using the CURRENT_DEFAULT and EXISTS_DEFAULT keys that Spark recognizes.

The changes extend TypeToSparkType() function to extract default values from Iceberg fields and convert them to Spark SQL string representations, enabling Spark to understand and utilize the defaults that were defined in Iceberg.

Tests for initial defaults weren't added here since that functionality already works without these changes. So I'll follow up with some to be added to this new test suite.

Note: The current tests focus on default Write capabilities as partial column inserts for DSV2 tables aren't available until Spark 4.1.0 per apache/spark#50044.

@github-actions github-actions Bot added the spark label Oct 23, 2025

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks pretty good for the first part of respecting default values in Spark. Just some minor comments
Thank you @geruh !

Comment thread spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/TestSparkSchemaUtil.java Outdated
Comment thread spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java Outdated
Comment thread spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java Outdated
@huaxingao

Copy link
Copy Markdown
Contributor

@geruh Thanks for the PR! It looks good to me overall. Just left a few minor comments.

@amogh-jahagirdar amogh-jahagirdar changed the title Spark: Add schema conversion support for default values Spark 4.0: Add schema conversion support for default values Oct 29, 2025

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @geruh, I'll leave it up for a bit in case anyone else has any comments.

@amogh-jahagirdar

Copy link
Copy Markdown
Contributor

Thanks @geruh and @huaxingao for reviewing!

@amogh-jahagirdar amogh-jahagirdar merged commit a99dc4f into apache:main Oct 29, 2025
27 checks passed
thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026
talatuyarer pushed a commit to talatuyarer/iceberg that referenced this pull request Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants