Skip to content

Hive: HiveCatalog listTables takes minutes if there are thousands tab…#3908

Merged
rdblue merged 5 commits into
apache:masterfrom
BKBASE-Plugin:filter_iceberg_table_flag
Jan 24, 2022
Merged

Hive: HiveCatalog listTables takes minutes if there are thousands tab…#3908
rdblue merged 5 commits into
apache:masterfrom
BKBASE-Plugin:filter_iceberg_table_flag

Conversation

@vanliu-tx

Copy link
Copy Markdown
Contributor

…les in namespace

A flag is added to indicate whether to filter iceberg table in HiveCatalog#listTables method. Related to #3907

@hililiwei

Copy link
Copy Markdown
Contributor

Can it be improved if change to parallelStream?

@vanliu-tx

vanliu-tx commented Jan 17, 2022

Copy link
Copy Markdown
Contributor Author

Can it be improved if change to parallelStream?

I don't think so. Besides, in our production, we use different namespace schema for different table type. For example, tables under namespace iceberg_{biz_id} are all iceberg tables, tables under namespace hive_{biz_id} are all hive partition tables. It's wasting time for method call List<Table> tableObjects = clients.run(client -> client.getTableObjectsByName(database, tableNames)) in our env.

@vanliu-tx

Copy link
Copy Markdown
Contributor Author

@rdblue @jackye1995 could you help to review this?

@hililiwei

Copy link
Copy Markdown
Contributor

Does it only work in Hivecatalog? If so, would it be better to name 'hive.filter-iceberg-table' or 'hive.show-iceberg-table-only'?

Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/CatalogProperties.java Outdated
Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java Outdated
@rdblue

rdblue commented Jan 17, 2022

Copy link
Copy Markdown
Contributor

Can it be improved if change to parallelStream?

Iceberg doesn't use streams for parallelism because they are quite limited. When parallelizing operations, be sure to use Tasks instead.

@rdblue rdblue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this option is a good idea, but it should be specific to Hive and we should carefully consider naming to make it clear.

@vanliu-tx

Copy link
Copy Markdown
Contributor Author

Does it only work in Hivecatalog? If so, would it be better to name 'hive.filter-iceberg-table' or 'hive.show-iceberg-table-only'?

yes, only in HiveCatalog, will change the name for better understanding.

Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java Outdated
@rdblue

rdblue commented Jan 18, 2022

Copy link
Copy Markdown
Contributor

Thanks, @vanliu-tx. Looking close to ready. I think you just need to rename the property.

@vanliu-tx

Copy link
Copy Markdown
Contributor Author

Thanks, @vanliu-tx. Looking close to ready. I think you just need to rename the property.

I was on vacation yesterday, sorry for the delay.

@vanliu-tx

Copy link
Copy Markdown
Contributor Author

@rdblue could you help to merge this PR?

Comment thread hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java Outdated
@rdblue rdblue merged commit e1c8016 into apache:master Jan 24, 2022
@rdblue

rdblue commented Jan 24, 2022

Copy link
Copy Markdown
Contributor

Thanks, @vanliu-tx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants