Skip to content

Spark: Add procedure to publish WAP changes using wap.id#4715

Merged
rdblue merged 1 commit into
apache:masterfrom
edgarRd:spark-proc-apply-wap-changes
Jul 6, 2022
Merged

Spark: Add procedure to publish WAP changes using wap.id#4715
rdblue merged 1 commit into
apache:masterfrom
edgarRd:spark-proc-apply-wap-changes

Conversation

@edgarRd

@edgarRd edgarRd commented May 6, 2022

Copy link
Copy Markdown
Contributor

While the WAP workflow enables users to write data via SQL using a wap_id, there's a lack in usability when applying those changes written to the table during the Publish step of the workflow. The user needs to figure out the snapshot-id programmatically to cherry-pick the changes. Ideally, we should have all steps of the WAP workflow available via SQL, since Iceberg has the wap-id => snapshot-id mapping in its own metadata.

This PR proposes a SQL procedure to cherry-pick the changes created with a wap-id. Functionally, it works the same as the cherry-pick procedure, but receives a wap-id as argument instead of a snapshot-id. This would make the Write and Publish parts of WAP available via SQL. The procedure name proposed is publish_changes but I'm open to suggestions if another name would fit better.

Thanks.

BTW - I considered extending the current cherry-pick procedure but I figured the implementation wouldn't look that much different nor we'd end up with less code since there's a few things that'd need to be overwritten from an already simple procedure; so mostly it would've been coupling both implementations. Open to suggestions.

@github-actions github-actions Bot added the spark label May 6, 2022
@edgarRd

edgarRd commented May 16, 2022

Copy link
Copy Markdown
Contributor Author

PTAL @rdblue when you have a chance. Thanks!

@rdblue

rdblue commented Jun 29, 2022

Copy link
Copy Markdown
Contributor

@edgarRd, sorry I missed this. Can you rebase and I'll review?

@edgarRd edgarRd force-pushed the spark-proc-apply-wap-changes branch from a57181b to 1273c85 Compare June 29, 2022 22:56
@edgarRd

edgarRd commented Jun 29, 2022

Copy link
Copy Markdown
Contributor Author

Thanks @rdblue - I've rebased the branch.

@rdblue

rdblue commented Jul 6, 2022

Copy link
Copy Markdown
Contributor

@RussellSpitzer, can you also take a look at this?

@rdblue rdblue merged commit 56c1993 into apache:master Jul 6, 2022
@rdblue

rdblue commented Jul 6, 2022

Copy link
Copy Markdown
Contributor

Thanks, @edgarRd! Could you also port this to Spark 3.3?

@edgarRd

edgarRd commented Jul 6, 2022

Copy link
Copy Markdown
Contributor Author

Thank you, @rdblue - I'll send a PR for porting to Spark 3.3 in a bit.

@singhpk234

Copy link
Copy Markdown
Contributor

Thanks @edgarRd !!

should we also add this procedure in Spark Procedures Doc

@edgarRd

edgarRd commented Jul 11, 2022

Copy link
Copy Markdown
Contributor Author

Thanks @edgarRd !!

should we also add this procedure in Spark Procedures Doc

Yeah, good catch! I can add the docs in the follow up PR I have for adding it to Spark 3.3: #5223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants