Skip to content

API: Add expression equivalence testing#4947

Merged
rdblue merged 6 commits into
apache:masterfrom
rdblue:add-expression-equivalence
Jun 3, 2022
Merged

API: Add expression equivalence testing#4947
rdblue merged 6 commits into
apache:masterfrom
rdblue:add-expression-equivalence

Conversation

@rdblue

@rdblue rdblue commented Jun 2, 2022

Copy link
Copy Markdown
Contributor

This adds ExpressionUtil.equivalent and ExpressionUtil.selectsPartitions to determine whether an expression will select whole files in a partition spec. This is needed to determine whether a filter is guaranteed to delete only whole files, or whether a row-level plan should be used. It is also needed to implement aggregate pushdown with non-trivial filters because filters must match whole files in order to use file metadata to satisfy a query.

@github-actions github-actions Bot added the API label Jun 2, 2022
@rdblue rdblue force-pushed the add-expression-equivalence branch from a304dbd to 0ba6dd5 Compare June 2, 2022 20:48
static <T> Set<T> setOf(Iterable<Literal<T>> literals) {
Literal<T> lit = Iterables.get(literals, 0);
if (lit instanceof Literals.StringLiteral && lit.value() instanceof CharSequence) {
if (lit instanceof Literals.StringLiteral) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always true because StringLiteral wraps a CharSequence.

Comment thread api/src/main/java/org/apache/iceberg/expressions/BoundLiteralPredicate.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/expressions/BoundTransform.java Outdated
* @param other another expression
* @return true if the expressions are equivalent
*/
default boolean isEquivalentTo(Expression other) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases we use is prefix for boolean methods and in which no?
We are not very consistent in the code right now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically try to make the method name form a natural sentence in English, like if (someFilter.isEquivalentTo(other)) reads like "if some filter is equivalent to other". In some cases it makes more sense to use other words, like has.

I wasn't sure what to do with ExpressionUtil.equivalent since it compares two things, but areEquivalent(f1, f2, schema) doesn't feel right. I just left that as ExpressionUtil.equivalent. Suggestions are welcome!

Comment thread api/src/main/java/org/apache/iceberg/expressions/Expression.java
Comment thread api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java Outdated
* @param spec a partition spec
* @return true if the expression will select whole partitions in the given spec
*/
public static boolean selectsPartitions(Expression expr, PartitionSpec spec) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct this should allow us to optimize metadata-only deletes by handling obvious cases without the need to plan files with potential matches and invoking evaluators?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll follow up with that once this is merged.

Comment thread api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java Outdated

@aokolnychyi aokolnychyi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. One question about case sensitivity when building projections.

@rdblue rdblue merged commit ff7c217 into apache:master Jun 3, 2022
@rdblue

rdblue commented Jun 3, 2022

Copy link
Copy Markdown
Contributor Author

Thanks for reviewing, @aokolnychyi!

@wypoon

wypoon commented Jun 6, 2022

Copy link
Copy Markdown
Contributor

@rdblue after I picked up this commit in master, my local build fails with

Execution failed for task ':iceberg-api:revapi'.
> There were Java public API/ABI breaks reported by revapi:
  
  java.method.addedToInterface: Method was added to an interface.
  
  old: <none>
  new: method boolean org.apache.iceberg.expressions.BoundTerm<T>::isEquivalentTo(org.apache.iceberg.expressions.BoundTerm<?>)
  
  SOURCE: BREAKING, BINARY: NON_BREAKING, SEMANTIC: POTENTIALLY_BREAKING
  
  From old archive: <none>
  From new archive: iceberg-api-0.14.0-SNAPSHOT.jar
  
  If this is an acceptable break that will not harm your users, you can ignore it in future runs like so for:
  
    * Just this break:
        ./gradlew :iceberg-api:revapiAcceptBreak --justification "{why this break is ok}" \
          --code "java.method.addedToInterface" \
          --new "method boolean org.apache.iceberg.expressions.BoundTerm<T>::isEquivalentTo(org.apache.iceberg.expressions.BoundTerm<?>)"
    * All breaks in this project:
        ./gradlew :iceberg-api:revapiAcceptAllBreaks --justification "{why this break is ok}"
    * All breaks in all projects:
        ./gradlew revapiAcceptAllBreaks --justification "{why this break is ok}"
  ----------------------------------------------------------------------------------------------------

Do you need to add something to .palantir/revapi.yml?

@rdblue

rdblue commented Jun 6, 2022

Copy link
Copy Markdown
Contributor Author

@wypoon, I merged a PR that added BoundTerm.isEquivalentTo earlier today: c9efc4b

@kbendick, can you make sure we're running the checks on PRs correctly?

@wypoon

wypoon commented Jun 6, 2022

Copy link
Copy Markdown
Contributor

@wypoon, I merged a PR that added BoundTerm.isEquivalentTo earlier today: c9efc4b

Thanks. I haven't picked up #4965. I'll do so then.

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants