[DOCS] Document how to add per-catalog hadoop conf values with Spark#2922
Conversation
| Similar to configuring Hadoop properties by using `spark.hadoop.*`, it's possible to set per-catalog Hadoop configuration values when using Spark by adding the property for the catalog with the prefix `spark.sql.catalog.(catalog-name).hadoop.*`. These properties will take precedence over values configured globally using `spark.hadoop.*` and will only affect Iceberg tables. | ||
|
|
||
| ```plain | ||
| spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.endpoint = http://aws-local:9000 |
There was a problem hiding this comment.
Maybe add an example for hadoop.hive.metastore.uris, which is one of the most common use case here
There was a problem hiding this comment.
Sure. I will update to that instead.
There was a problem hiding this comment.
Sorry for late response, I thought I hit comment and I had not.
Wouldn't hive metastore uri's be set via the catalog's existing exposed uri parameter? E.g. spark.sql.catalog.(catalog-name).uri: https://github.com/apache/iceberg/blame/master/site/docs/spark-configuration.md#L60
There was a problem hiding this comment.
I'll put hadoop.hive.metastore.kerberos.principal=hadoop/_HOST@REALM possibly instead?
There was a problem hiding this comment.
Yes, I don't think that we want to point to the metastore URI because that's what our uri property overrides.
|
Thanks for fixing this, @kbendick! |
Adds a section to the Spark documentation on the website about how to override hadoop configuration values per catalog.
This is a very simple explanation and I'm open to discussion on what should be added.
This closes issue #2907
cc @rdblue