Spark 3.3: Fix predicate pushdown for copy-on-write MERGE commands#6633
Conversation
| val readRelation = buildRelationWithAttrs(relation, operationTable, metadataAttrs) | ||
| val readAttrs = readRelation.output | ||
|
|
||
| val (targetCond, joinCond) = splitMergeCond(cond, readRelation) |
There was a problem hiding this comment.
I reverted changes in #6534 for copy-on-write operations. It was not safe as pushing the join condition into a filter on the left side is not safe in LeftOuter and FullOuter joins. It changes the output, which can lead to loosing records that did not match the condition (see the new test).
258d3c4 to
4ac7dfa
Compare
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks for the explanation @aokolnychyi I added the test before the fix done here, and stepped through the debugger to see why it was failing before the fix. Also stepped through the MoR pushdown cases, I understand it better now. This fix makes sense to me, thanks!
RussellSpitzer
left a comment
There was a problem hiding this comment.
We went over an in-depth walkthrough of this code, looks like this is the right thing to do
|
Thank you, @amogh-jahagirdar @RussellSpitzer! |
This PR fixes predicate pushdown for copy-on-write MERGE commands, which was broken after #6534. This change contains a test that would previously fail and lead to a data correctness issue.