Skip to content

Support catalog method to set table metadata #5163

Description

@szehon-ho

Background:

#3851 and further prs added support to register a new table in catalog pointing to existing metadata.json. It may be good to support setting an existing table in catalog to another metadata.json.

Justification:

There have been instances in the past where such a method would have been welcome, the workaround unfortunately is to manually update the catalog (ie, update Hive Metastore directly), or even drop and recreate.

  1. Disaster recovery from outdated catalog backup with a table pointing to an older metadata.json
  2. Bugs where table becomes in a bad state due to bad metadata modification (in particular, partition field dropping bugs that made the table impossible to query):
    a. https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1654877950329549 (just one instance of trying to help user)
    b. Trino cannot read an Iceberg table that has dropped a partition field trinodb/trino#8284
    c. Fixes read metadata failed after dropped partition for V1 format #3411
    d. Core: Fix Partitions table for evolved partition specs #4560
  3. Catalog consistency problems where metadata.jsons get overriden from each other:
    a. Hive: Fix concurrent transactions overwriting commits by adding hive lock heartbeats. #5036
    b. some problems in custom catalog impelementations

Ideas:

  1. Add a force option to catalog::registerTable
  2. Add a different method to catalog (reset table?)

Concerns:

This could of course be dangerous, but metadata.json does seem a user-exposed concept already in catalog API and in some utils. User can achieve this today by dropping table with purge=false, and registering the metadata.json to new table with same name.

We could potentially run some kind of table consistency check (explore all the reachable graph and even historic metadata graph to validate that the new table metadata is consistent). This could be another utility that could be just generally useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions