Skip to content

Spark: Add sort_by parameter to rewrite_manifests procedure#15467

Merged
RussellSpitzer merged 3 commits into
apache:mainfrom
hemanthboyina:rewrite_manifests_sortby
Mar 9, 2026
Merged

Spark: Add sort_by parameter to rewrite_manifests procedure#15467
RussellSpitzer merged 3 commits into
apache:mainfrom
hemanthboyina:rewrite_manifests_sortby

Conversation

@hemanthboyina

Copy link
Copy Markdown
Contributor

This PR adds the sort_by parameter to the rewrite_manifests stored procedure, exposing the sortBy functionality
that was added to RewriteManifestsSparkAction . Currently, custom manifest clustering by partition fields
is only accessible through the Java API. This change allows SQL users to specify which partition fields to cluster
manifests by, which can reduce scan planning time by enabling Spark to skip manifests that don't contain relevant
partition values.

Example:
CALL catalog.system.rewrite_manifests(table => 'db.sample', sort_by => array('category'));

@hemanthboyina

Copy link
Copy Markdown
Contributor Author

@RussellSpitzer @singhpk234 @huaxingao can you please help in review , thanks

Comment thread docs/docs/spark-procedures.md Outdated
| `table` | ✔️ | string | Name of the table to update |
| `use_caching` | ️ | boolean | Use Spark caching during operation (defaults to false). Enabling caching can increase memory footprint on executors. |
| `spec_id` | ️ | int | Spec id of the manifests to rewrite (defaults to current spec id) |
| `sort_by` | ️ | array<string> | List of partition field names to cluster manifests by. Choosing frequently queried partition fields can reduce planning time by skipping unnecessary manifests. If not set, manifests will be sorted by all partition fields in spec order. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"partition transform names"?

action.specId(specId);
}

if (sortBy != null) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fail on an empty array here?

@RussellSpitzer RussellSpitzer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me, I have a few little nits

@hemanthboyina

Copy link
Copy Markdown
Contributor Author

thanks for the review @RussellSpitzer , have updated the changes

@RussellSpitzer RussellSpitzer merged commit dd248a9 into apache:main Mar 9, 2026
26 checks passed
@RussellSpitzer

Copy link
Copy Markdown
Member

Thanks @hemanthboyina ! Merged

RjLi13 pushed a commit to RjLi13/iceberg that referenced this pull request Mar 12, 2026
talatuyarer pushed a commit to talatuyarer/iceberg that referenced this pull request Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants