[DOCS] Document how to add per-catalog hadoop conf values with Spark by kbendick · Pull Request #2922 · apache/iceberg

kbendick · 2021-08-02T19:43:19Z

Adds a section to the Spark documentation on the website about how to override hadoop configuration values per catalog.

This is a very simple explanation and I'm open to discussion on what should be added.

This closes issue #2907

cc @rdblue

…hen using Spark

kbendick · 2021-08-03T22:06:13Z

cc @RussellSpitzer @flyrain @raptond

flyrain · 2021-08-03T22:15:12Z

+Similar to configuring Hadoop properties by using `spark.hadoop.*`, it's possible to set per-catalog Hadoop configuration values when using Spark by adding the property for the catalog with the prefix `spark.sql.catalog.(catalog-name).hadoop.*`. These properties will take precedence over values configured globally using `spark.hadoop.*` and will only affect Iceberg tables.
+
+```plain
+spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.endpoint = http://aws-local:9000


Maybe add an example for hadoop.hive.metastore.uris, which is one of the most common use case here

Sure. I will update to that instead.

Sorry for late response, I thought I hit comment and I had not.

Wouldn't hive metastore uri's be set via the catalog's existing exposed uri parameter? E.g. spark.sql.catalog.(catalog-name).uri: https://github.com/apache/iceberg/blame/master/site/docs/spark-configuration.md#L60

I'll put hadoop.hive.metastore.kerberos.principal=hadoop/_HOST@REALM possibly instead?

Yes, I don't think that we want to point to the metastore URI because that's what our uri property overrides.

rdblue · 2021-08-06T21:14:42Z

Thanks for fixing this, @kbendick!

[SITE] Document the ability to add per-catalog hadoop configuration w…

aec8b68

…hen using Spark

github-actions Bot added the docs label Aug 2, 2021

Fix typo

93d6a7d

kbendick changed the title ~~[SITE][DOCS] Document how to add per-catalog hadoop conf values with Spark~~ [DOCS] Document how to add per-catalog hadoop conf values with Spark Aug 2, 2021

flyrain reviewed Aug 3, 2021

View reviewed changes

rdblue approved these changes Aug 6, 2021

View reviewed changes

rdblue merged commit e315d65 into apache:master Aug 6, 2021

kbendick deleted the document-spark-catalog-hadoop-configuration branch August 10, 2021 06:11

rdblue mentioned this pull request Aug 17, 2021

Add 0.12.0 release notes pt 2 #2986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DOCS] Document how to add per-catalog hadoop conf values with Spark#2922

[DOCS] Document how to add per-catalog hadoop conf values with Spark#2922
rdblue merged 2 commits into
apache:masterfrom
kbendick:document-spark-catalog-hadoop-configuration

kbendick commented Aug 2, 2021

Uh oh!

kbendick commented Aug 3, 2021

Uh oh!

flyrain Aug 3, 2021

Uh oh!

kbendick Aug 4, 2021

Uh oh!

kbendick Aug 6, 2021

Uh oh!

kbendick Aug 6, 2021

Uh oh!

rdblue Aug 6, 2021

Uh oh!

rdblue commented Aug 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

kbendick commented Aug 2, 2021

Uh oh!

kbendick commented Aug 3, 2021

Uh oh!

flyrain Aug 3, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick Aug 4, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick Aug 6, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick Aug 6, 2021

Choose a reason for hiding this comment

Uh oh!

rdblue Aug 6, 2021

Choose a reason for hiding this comment

Uh oh!

rdblue commented Aug 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants