Spark: CALL procedure for rewrite_data_files#3375
Conversation
|
Thanks for working on this! Almost forgot we haven't added the call procedure for this yet... I remember Russell had laid out the arguments for the procedure in https://docs.google.com/document/d/1aXo1VzuXxSuqcTzMLSQdnivMVgtLExgDWUFMvWeXRxc, could you follow that proposal? |
|
@jackye1995 : Thanks for sharing this doc. I will look into it and implement as per that. |
|
@jackye1995 , @RussellSpitzer : Also I should copy the same code in spark/v3.2 folder as well ? p.s: Also let me know if any other arguments is missed. |
For "where" can you check and see if you can produce expressions using the Spark session's parser? I think you may be able to convert to expressions using the parser and then translate them using our Spark3 utils ... I'm not sure though. |
6dea11f to
36ed20a
Compare
|
@jackye1995 , @RussellSpitzer , @rdblue : Actually for supporting filter, I have used the spark parser to parse the string as expression. But I got spark catalyst expression. But rewrite data files action needs Iceberg expression instead of spark catalyst expression. |
|
There is a conversion from Spark to Iceberg expressions in |
@rdblue : I think it is spark filter to Iceberg expression conversion. I needed spark expression to Iceberg expression conversion. |
|
@RussellSpitzer , @jackye1995 , @rdblue : I will back sync to v3.0 folder once this PR is reviewed and merged. Thanks. |
2194b42 to
170fe7b
Compare
|
@RussellSpitzer : So, anything else need to be done for this PR or it can be merged ? |
jackye1995
left a comment
There was a problem hiding this comment.
overall looks good to me, just some nit
|
@jackye1995 : Thanks for the review. I have pushed the nit changes also. |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Just had some minor comments which I think we need to wrap up, other than that I think this is looking pretty good. I know it's difficult for us to work this kind of functionality into Iceberg when Spark keeps a lot of these methods from us.
|
@RussellSpitzer : Thanks for the review again. I have addressed the comments (also added a comment reply) and pushed the new changes. |
|
@RussellSpitzer : I have updated the test case with dataframe write instead of sql now. So, I think all comments are handled now. Thanks. |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Thanks so much, this is a great addition to the codebase
|
@RussellSpitzer : Thanks for merging the PR. Appreciate your guidance and patient reply to my questions. It helped in finally avoiding the custom codes 👍🏻 |
Backport of #3375 - Support CALL procedure for rewrite_data_files
Backport of #3375 - Support CALL procedure for rewrite_data_files
Supported the call procedure for
rewrite_data_files.Supported input arguments:
table, [strategy], [sort_order], [options], [where]Supported output arguments:
rewritten_data_files_count,added_data_files_countAdded test cases to cover the scenarios.
Fixes #3355