Skip to content

Core: Use InternalData with avro for readers.#12476

Merged
danielcweeks merged 1 commit into
apache:mainfrom
danielcweeks:avro-internal-data
Mar 20, 2025
Merged

Core: Use InternalData with avro for readers.#12476
danielcweeks merged 1 commit into
apache:mainfrom
danielcweeks:avro-internal-data

Conversation

@danielcweeks

@danielcweeks danielcweeks commented Mar 7, 2025

Copy link
Copy Markdown
Contributor

This PR expands the use of InternalData for metadata reads to ManifestsLists and AllManifestsTable and uses a common iterable that exposes access to file metadata like Avro previously supported. Implements the new metadata read path for both Avro and Parquet.

Comment thread core/src/main/java/org/apache/iceberg/InternalData.java Outdated
Comment thread parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java Outdated
@danielcweeks danielcweeks marked this pull request as draft March 7, 2025 17:27
@danielcweeks danielcweeks changed the title Core: Use InternalData with avro with common DataIterable for readers. Core: Use InternalData with avro and common DataIterable for readers. Mar 7, 2025
@danielcweeks danielcweeks force-pushed the avro-internal-data branch 2 times, most recently from 6f1eb95 to fa60870 Compare March 7, 2025 17:51
Comment thread core/src/main/java/org/apache/iceberg/ManifestReader.java Outdated
@danielcweeks danielcweeks marked this pull request as ready for review March 7, 2025 21:54
Comment thread core/src/main/java/org/apache/iceberg/AllManifestsTable.java
Comment thread core/src/main/java/org/apache/iceberg/InternalData.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/AllManifestsTable.java Outdated
@github-actions github-actions Bot added the API label Mar 12, 2025
.build()) {
metadata = headerReader.getMetadata();

if (headerReader instanceof AvroIterable) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can hunt down the metadata usage separately from this PR, but this preserves existing behavior while transitioning to the new InternalData read path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is reasonable because it should flag cases where we are using the metadata to recover the partition spec in tests when running with Parquet or formats that don't support it.

@danielcweeks danielcweeks requested a review from rdblue March 20, 2025 17:32
@danielcweeks danielcweeks changed the title Core: Use InternalData with avro and common DataIterable for readers. Core: Use InternalData with avro for readers. Mar 20, 2025
@danielcweeks danielcweeks merged commit 21497fd into apache:main Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants