Skip to content

[DOCS] Document how to add per-catalog hadoop conf values with Spark#2922

Merged
rdblue merged 2 commits into
apache:masterfrom
kbendick:document-spark-catalog-hadoop-configuration
Aug 6, 2021
Merged

[DOCS] Document how to add per-catalog hadoop conf values with Spark#2922
rdblue merged 2 commits into
apache:masterfrom
kbendick:document-spark-catalog-hadoop-configuration

Conversation

@kbendick

@kbendick kbendick commented Aug 2, 2021

Copy link
Copy Markdown
Contributor

Adds a section to the Spark documentation on the website about how to override hadoop configuration values per catalog.

This is a very simple explanation and I'm open to discussion on what should be added.

This closes issue #2907

cc @rdblue

@github-actions github-actions Bot added the docs label Aug 2, 2021
@kbendick kbendick changed the title [SITE][DOCS] Document how to add per-catalog hadoop conf values with Spark [DOCS] Document how to add per-catalog hadoop conf values with Spark Aug 2, 2021
@kbendick

kbendick commented Aug 3, 2021

Copy link
Copy Markdown
Contributor Author

cc @RussellSpitzer @flyrain @raptond

Similar to configuring Hadoop properties by using `spark.hadoop.*`, it's possible to set per-catalog Hadoop configuration values when using Spark by adding the property for the catalog with the prefix `spark.sql.catalog.(catalog-name).hadoop.*`. These properties will take precedence over values configured globally using `spark.hadoop.*` and will only affect Iceberg tables.

```plain
spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.endpoint = http://aws-local:9000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an example for hadoop.hive.metastore.uris, which is one of the most common use case here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will update to that instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response, I thought I hit comment and I had not.

Wouldn't hive metastore uri's be set via the catalog's existing exposed uri parameter? E.g. spark.sql.catalog.(catalog-name).uri: https://github.com/apache/iceberg/blame/master/site/docs/spark-configuration.md#L60

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put hadoop.hive.metastore.kerberos.principal=hadoop/_HOST@REALM possibly instead?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't think that we want to point to the metastore URI because that's what our uri property overrides.

@rdblue rdblue merged commit e315d65 into apache:master Aug 6, 2021
@rdblue

rdblue commented Aug 6, 2021

Copy link
Copy Markdown
Contributor

Thanks for fixing this, @kbendick!

@kbendick kbendick deleted the document-spark-catalog-hadoop-configuration branch August 10, 2021 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants