Skip to content

AWS: Add S3 Tagging#4259

Merged
jackye1995 merged 2 commits into
apache:masterfrom
rajarshisarkar:s3-tagging
Mar 14, 2022
Merged

AWS: Add S3 Tagging#4259
jackye1995 merged 2 commits into
apache:masterfrom
rajarshisarkar:s3-tagging

Conversation

@rajarshisarkar

@rajarshisarkar rajarshisarkar commented Mar 3, 2022

Copy link
Copy Markdown
Contributor

This change adds S3 Tags to the objects while writing using S3FileIO. Users can pass their custom tags as part of the catalog properties.

Spark SQL launch command:

sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \
    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key=my_val \
    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key2=my_val2

Tags added in S3:

aws s3api get-object-tagging --bucket iceberg-warehouse --key s3-tagging/data/00000-0-593dfa9d-872f-4683-986f-ea220b7c875d-00001.parquet
{
    "TagSet": [
        {
            "Key": "my_key2",
            "Value": "my_val2"
        },
        {
            "Key": "my_key",
            "Value": "my_val"
        }
    ]
}

cc: @jackye1995 @arminnajafi @singhpk234 @amogh-jahagirdar @xiaoxuandev @yyanyy

@github-actions github-actions Bot added the AWS label Mar 3, 2022
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3Tag.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3Tag.java Outdated
@rajarshisarkar rajarshisarkar marked this pull request as ready for review March 7, 2022 13:17
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java Outdated
@rajarshisarkar

rajarshisarkar commented Mar 9, 2022

Copy link
Copy Markdown
Contributor Author

I took the idea of parsing the properties beginning with a prefix from #4011

Refer method name: PropertyUtil.propertiesWithPrefix

@jackye1995 jackye1995 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic mostly looks correct to me, added some comments

Comment thread aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java Outdated
Comment thread aws/src/test/java/org/apache/iceberg/aws/s3/TestS3OutputStream.java Outdated

@singhpk234 singhpk234 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @rajarshisarkar !!! for all the work.

Comment thread core/src/main/java/org/apache/iceberg/util/PropertyUtil.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/util/PropertyUtil.java Outdated
Comment thread aws/src/test/java/org/apache/iceberg/aws/s3/TestS3OutputStream.java Outdated
Comment thread aws/src/test/java/org/apache/iceberg/aws/s3/TestS3OutputStream.java Outdated
Comment thread aws/src/test/java/org/apache/iceberg/aws/s3/TestS3OutputStream.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java Outdated
Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputFile.java

@jackye1995 jackye1995 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@amogh-jahagirdar amogh-jahagirdar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Comment thread aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java Outdated
@jackye1995 jackye1995 merged commit 10f8509 into apache:master Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants