Spark 3.5: Don't change table distribution when only altering local order#10774
Conversation
29f80d6 to
66d58ed
Compare
|
@RussellSpitzer could you please take another look? |
|
@RussellSpitzer @aokolnychyi @szehon-ho would you mind taking another look? |
| } else if (orderingSpec.UNORDERED != null || orderingSpec.LOCALLY != null) { | ||
| DistributionMode.NONE | ||
| Some(DistributionMode.HASH) | ||
| } else if (orderingSpec.UNORDERED != null) { |
There was a problem hiding this comment.
Why not just return None when distributionSpec not set here? I thought that's more inline with the goal of the pr (to not override default unless explicitly set).
There was a problem hiding this comment.
I realize it would break a few tests and is a big behavior change. So @RussellSpitzer @aokolnychyi let me know if you disagree?
There was a problem hiding this comment.
Which do you mean will break a few tests and is a big change?
There was a problem hiding this comment.
val distributionMode = if (distributionSpec != null) {
DistributionMode.HASH
} else if (orderingSpec.UNORDERED != null || orderingSpec.LOCALLY != null) {
DistributionMode.NONE
} else {
DistributionMode.RANGE
}
makes more sense to me as:
val distributionMode = if (distributionSpec != null) {
Some(DistributionMode.HASH)
} else {
None
}
There was a problem hiding this comment.
I thought the goal is to not set distributionMode , if its not in the ALTER statement. Hence why I thought the above code, which returns None in those cases unless its explicitly present in ALTER statement.
The current code works for the exact statement ALTER TABLE LOCALLY UNORDERED, but not for other ALTER's that dont have distributionMode?
There was a problem hiding this comment.
I believe this function is used by other pieces of code. Specifically the Create statements where RANGE is set if there is an ordering. Maybe it makes sense to separate it out into a Distribution and ordering for ALTER and one for CREATE?
There was a problem hiding this comment.
@RussellSpitzer This method is for ALTER only.
@szehon-ho The result of this method can be one of HASH, NONE, RANGE or unchanged depending on the combination of distributionSpec and orderSpec. Thus, we cannot just return None (which mean unchanged) when distributionSpec is not set.
|
@RussellSpitzer Could you please take another look? It might be valuable to be included in 1.7 since it's a long-standing issue. |
|
Yes I think we are good here! Thanks @manuzhang for the patch and @szehon-ho for the review |
As discussed with @aokolnychyi @szehon-ho and @RussellSpitzer in #10647 (comment)