Skip to content

AWS: Use Glue optimistic locking while updating table#4166

Merged
jackye1995 merged 5 commits into
apache:masterfrom
rajarshisarkar:glue-optimistic-locking
Feb 23, 2022
Merged

AWS: Use Glue optimistic locking while updating table#4166
jackye1995 merged 5 commits into
apache:masterfrom
rajarshisarkar:glue-optimistic-locking

Conversation

@rajarshisarkar

@rajarshisarkar rajarshisarkar commented Feb 18, 2022

Copy link
Copy Markdown
Contributor

Use Glue optimistic locking while updating table.

Retry logs in case of concurrent updates:

22/02/21 15:37:10 WARN Tasks: Retrying task after failure: Cannot commit my_catalog.db_iceberg.table because Glue detected concurrent update
org.apache.iceberg.exceptions.CommitFailedException: Cannot commit my_catalog.db_iceberg.table because Glue detected concurrent update
	at org.apache.iceberg.aws.glue.GlueTableOperations.doCommit(GlueTableOperations.java:127)
	at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:127)
	at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$5(BaseTransaction.java:377)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:362)

@github-actions github-actions Bot added the AWS label Feb 18, 2022
@jackye1995

Copy link
Copy Markdown
Contributor

@rajarshisarkar rajarshisarkar force-pushed the glue-optimistic-locking branch from 7fb0e65 to eff9126 Compare February 18, 2022 18:35
@rajarshisarkar rajarshisarkar force-pushed the glue-optimistic-locking branch from eff9126 to ae1c9d4 Compare February 18, 2022 18:43

@jackye1995 jackye1995 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, we are finally get this long-waiting feature in!

I think we need to do a bit more. Now we don't really need a lock manager if lock-impl is not set. I think using a in-memory lock is also not needed. Yes it adds some guard against commits happening in the same process, but it's also consuming daemon threads to run heartbeats and these can go wrong. The Glue service is scalable enough to handle concurrent commits at the service level so I think it's better if we implement a no-op lock manager for GlueCatalog and use that instead of the default in-memory lock manager if user did not specify any lock configuration. Any thoughts about this?

Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java Outdated
@rdblue

rdblue commented Feb 20, 2022

Copy link
Copy Markdown
Contributor

Now we don't really need a lock manager if lock-impl is not set.

True, but since this is using reflection and may not actually call versionId I think we need to be careful. We could check whether LOAD_VERSION_ID is a noop and require a lock implementation.

@rajarshisarkar

Copy link
Copy Markdown
Contributor Author

Now we don't really need a lock manager if lock-impl is not set. I think using a in-memory lock is also not needed.
If we implement a no-op lock manager for GlueCatalog and use that instead of the default in-memory lock manager if user did not specify any lock configuration. Any thoughts about this?

Yes, we shouldn't require any lock managers unless we encounter the noop scenario. We can fallback to in-memory lock manager in that case.

True, but since this is using reflection and may not actually call versionId I think we need to be careful. We could check whether LOAD_VERSION_ID is a noop and require a lock implementation.

Yes, I agree.

@github-actions github-actions Bot added the core label Feb 21, 2022
Comment thread core/src/main/java/org/apache/iceberg/util/LockManagers.java Outdated
@rajarshisarkar rajarshisarkar marked this pull request as ready for review February 22, 2022 11:05
Comment thread core/src/main/java/org/apache/iceberg/util/LockManagers.java

@jackye1995 jackye1995 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the work! @rajarshisarkar , and thanks @rdblue for the review

@findepi

findepi commented Apr 27, 2022

Copy link
Copy Markdown
Member

As a follow-up, can we deprecate and eventually remove lock manager for Glue, and rely on version IDs?
this would ensure compatibility with other applications.

cc @rdblue @jackye1995 @alexjo2144

@rdblue

rdblue commented Apr 28, 2022

Copy link
Copy Markdown
Contributor

@findepi, I think the problem is that we don't know that older clients have been updated to use the new commit logic in Glue. I would continue to respect any locking configuration if the user has set it up.

@findepi

findepi commented Apr 29, 2022

Copy link
Copy Markdown
Member

Currently we have a fragmented landscape, where users may use different locking mechanisms. This can lead to data loss when different apps use different locking mechanisms, providing false sense of security/
Ultimately, a unified approach is the best for the users -- I guess that's something we all would agree with.
So then the question is how to go from where we're today to where we want to be in the long term, i.e. plan the transition period, announcements, deprecation strategy, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants