You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#3851 and further prs added support to register a new table in catalog pointing to existing metadata.json. It may be good to support setting an existing table in catalog to another metadata.json.
Justification:
There have been instances in the past where such a method would have been welcome, the workaround unfortunately is to manually update the catalog (ie, update Hive Metastore directly), or even drop and recreate.
Disaster recovery from outdated catalog backup with a table pointing to an older metadata.json
This could of course be dangerous, but metadata.json does seem a user-exposed concept already in catalog API and in some utils. User can achieve this today by dropping table with purge=false, and registering the metadata.json to new table with same name.
We could potentially run some kind of table consistency check (explore all the reachable graph and even historic metadata graph to validate that the new table metadata is consistent). This could be another utility that could be just generally useful.
Background:
#3851 and further prs added support to register a new table in catalog pointing to existing metadata.json. It may be good to support setting an existing table in catalog to another metadata.json.
Justification:
There have been instances in the past where such a method would have been welcome, the workaround unfortunately is to manually update the catalog (ie, update Hive Metastore directly), or even drop and recreate.
a. https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1654877950329549 (just one instance of trying to help user)
b. Trino cannot read an Iceberg table that has dropped a partition field trinodb/trino#8284
c. Fixes read metadata failed after dropped partition for V1 format #3411
d. Core: Fix Partitions table for evolved partition specs #4560
a. Hive: Fix concurrent transactions overwriting commits by adding hive lock heartbeats. #5036
b. some problems in custom catalog impelementations
Ideas:
Concerns:
This could of course be dangerous, but metadata.json does seem a user-exposed concept already in catalog API and in some utils. User can achieve this today by dropping table with purge=false, and registering the metadata.json to new table with same name.
We could potentially run some kind of table consistency check (explore all the reachable graph and even historic metadata graph to validate that the new table metadata is consistent). This could be another utility that could be just generally useful.