Skip to content

Spark 3.4: WAP branch not propagated when using DELETE without WHERE#7900

Merged
RussellSpitzer merged 8 commits into
apache:masterfrom
rakesh-das08:iceberg_7851
Jul 7, 2023
Merged

Spark 3.4: WAP branch not propagated when using DELETE without WHERE#7900
RussellSpitzer merged 8 commits into
apache:masterfrom
rakesh-das08:iceberg_7851

Conversation

@rakesh-das08

@rakesh-das08 rakesh-das08 commented Jun 24, 2023

Copy link
Copy Markdown
Contributor

This resolves #7851

@github-actions github-actions Bot added the spark label Jun 24, 2023
@rakesh-das08 rakesh-das08 changed the title Fix Spark WAP branch not propagated when using DELETE without WHERE Spark: WAP branch not propagated when using DELETE without WHERE Jun 24, 2023
@rakesh-das08

Copy link
Copy Markdown
Contributor Author

@nastra Could you please review

@nastra nastra left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks mostly good to me with a few small things. Thanks for working on this @rakesh-das08

Comment thread spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java Outdated
@nastra

nastra commented Jun 29, 2023

Copy link
Copy Markdown
Contributor

@jackye1995 @aokolnychyi @RussellSpitzer could any of you also please review this?

@rakesh-das08

Copy link
Copy Markdown
Contributor Author

@nastra Thank you for reviewing this :)

spark, DataSourceV2Relation.create(metadataTable, Some.empty(), Some.empty(), options));
}

public static String wapBranch(SparkSession spark, String branch) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's clear enough from this name what this method is doing, can we add a java doc or possibly a new name?

Looks like it returns the WAP Branch if it is configured in the Spark conf and then the string that is passed in otherwise?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI Russell, I have added Javadoc for this method. Please review.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is much clearer. What I think this method is more likely defined as "determine write branch?" It is not restricted to WAP and also validates the configuration of non WAP config? It seems like it handles multiple concerns here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RussellSpitzer .
Would request your review once more. Made some changes in Javadoc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having this function do both things we should probably have two functions.

  1. Validate configuration
  2. Choose branch

In general if our javadoc says a function is doing two different things, it means that function needs to be split.

so code could look like

deleteWhere() {
   validateConfig
   
   
   branch = branch != null ? branch : wapBranch
   if (branch != null) {
     deleteFiles.toBranch(branch)
   }

@RussellSpitzer

Copy link
Copy Markdown
Member

Note I think this pr should also be title Spark 3.4:
Since we'll need to backport it to the other spark versions I assume

@rakesh-das08 rakesh-das08 changed the title Spark: WAP branch not propagated when using DELETE without WHERE Spark 3.4: WAP branch not propagated when using DELETE without WHERE Jul 3, 2023
@rakesh-das08

Copy link
Copy Markdown
Contributor Author

@RussellSpitzer, could you please review. Made changes as per your review comments.

@nastra nastra added this to the Iceberg 1.3.1 milestone Jul 7, 2023
@nastra

nastra commented Jul 7, 2023

Copy link
Copy Markdown
Contributor

@RussellSpitzer could you please re-review this one? I think it would be great to get this into 1.3.1


@Test
public void testDeleteToCustomWapBranchWithoutWhereClause() throws NoSuchTableException {
assumeThat(branch)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this assumption? Shouldn't this be something we are configuring in the test?

@RussellSpitzer

Copy link
Copy Markdown
Member

@nastra told me this is matching the style used elsewhere, so let's merge for now and fix all of the style issues later in a followup.

@RussellSpitzer RussellSpitzer merged commit 3a9516f into apache:master Jul 7, 2023
@rakesh-das08

Copy link
Copy Markdown
Contributor Author

Thank you @RussellSpitzer for reviewing this and merging.
Thank you @nastra for reviewing this and getting this merged with clarifications. Appreciate it!

@rakesh-das08 rakesh-das08 deleted the iceberg_7851 branch July 8, 2023 06:25
laithalzyoud added a commit to revolut-engineering/iceberg that referenced this pull request Jan 30, 2024
* Hive: Set commit state as Unknown before throwing CommitStateUnknownException (apache#7931) (apache#8029)

* Spark 3.4: WAP branch not propagated when using DELETE without WHERE (apache#7900) (apache#8028)

* Core: Include all reachable snapshots with v1 format and REF snapshot mode (apache#7621) (apache#8027)

* Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' (apache#8033) (apache#8036)

* Flink: remove the creation of default database in FlinkCatalog open method (apache#7795) (apache#8039)

* Core: Handle optional fields (apache#8050) (apache#8064)

* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ

* Core: Abort file groups should be under same lock as committerService (apache#7933) (apache#8060)

* Spark 3.4: Fix rewrite_position_deletes for certain partition types (apache#8059)

* Spark 3.3: Fix rewrite_position_deletes for certain partition types (apache#8059) (apache#8069)

* Spark: Add actions for disater recovery.

* Fix the compile error.

* Fix merge conflicts and formatting

* All tests are working and code integrated with Spark 3.3

* Fix union error and snapshots test

* Fix Spark broadcast error

* Add RewritePositionDeleteFilesSparkAction

---------

Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Xianyang Liu <liu-xianyang@hotmail.com>
Co-authored-by: Szehon Ho <szehon.apache@gmail.com>
Co-authored-by: Yufei Gu <yufei_gu@apple.com>
Co-authored-by: yufeigu <yufei@apache.org>
Co-authored-by: Laith Alzyoud <laith.alzyoud@revolut.com>
Co-authored-by: vaultah <4944562+vaultah@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spark WAP branch not propagated when using DELETE without WHERE

3 participants