Skip to content

Core: Avro logical timestamp-millis support#14401

Merged
nastra merged 9 commits into
apache:mainfrom
jamespfaulkner:avro-logicla-timestamp-ms
Nov 7, 2025
Merged

Core: Avro logical timestamp-millis support#14401
nastra merged 9 commits into
apache:mainfrom
jamespfaulkner:avro-logicla-timestamp-ms

Conversation

@jamespfaulkner

Copy link
Copy Markdown
Contributor

Hey folks,

This PR addresses the issue #12395

I've contributed a slightly modified version of the original PR addressing this issue.

The differences are:

  • This includes changes to PlannedDataReader to account for the deprecation of DataReader
  • timestamp-millis is handled independently of timestamp-micros to account for data files written outside of iceberg.

I tested this locally on a small avro based hive table that was migrated to iceberg and queried via a IcebergGenerics:read scan:

  • Using the existing main branch, it recreates the IllegalArgumentException: Unknown logical type: org.apache.hive.iceberg.org.apache.avro.LogicalTypes$TimestampMillis issue
  • Using the PR changes resulted in that Exception no longer being thrown, but the wrong DateTime was being returned.
  • Both issues appeared to be fixed with these changes.

return ChronoUnit.NANOS.addTo(EPOCH, nanosFromEpoch).toLocalDateTime();
}

public static LocalDateTime timestampFromMillis(long millisFromEpoch) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add tests for anything that's new to TestDateTimeUtil

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
return GenericWriters.timestampNanos();

case "timestamp-millis":

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a test that exercises this path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I created a test within TestAvroDataWriter that exercised both of the code paths under "timestamp-millis". However, upon further investigation, the files written through Iceberg are either going to use the logical type timestamp-micros or timestamp-nanos (corresponding to TimestampType and TimestampNanoType. I dont think we want to introduce another type.

I'm therefore thinking this change isn't required in the DataWriter. It would introduce inconsistencies between avro.DataWriter and avro.DataReader, but this PR is really only required to address issues with avro data written outside of iceberg.

I'm therefore thinking it's better to remove the changes from DataWriter.
WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed we should be removing "timestamp-millis" from the writers. Adding the possibility to read timestamp millis is fine, but we don't want to write it from Iceberg

@jamespfaulkner

Copy link
Copy Markdown
Contributor Author

@nastra Thanks for the review

Comment thread core/src/main/java/org/apache/iceberg/data/avro/GenericReaders.java Outdated
}
return GenericReaders.timestampNanos();

case "timestamp-millis":

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need a test for this code path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the same tests we had for PlannedDataReader. Also added tests for the TZ code path too.

- Removed timestamp-millis from Writers
- Added tests for DataWriter and tz tests
@jamespfaulkner

Copy link
Copy Markdown
Contributor Author

@nastra Thanks for the feedback/followup review 👍

@nastra nastra requested review from huaxingao and pvary November 5, 2025 17:47
@huaxingao

Copy link
Copy Markdown
Contributor

Thanks @jamespfaulkner for the PR! It looks good to me overall. Could we also add a test for a pre-epoch timestamp case (e.g., 1965-01-01 10:11:12.123) and a test that exercises the new writer code paths? Thanks!

return TimestamptzNanoWriter.INSTANCE;
}

static ValueWriter<LocalDateTime> timestampMillis() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes to this file can be reverted, since this isn't used anywhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

class TestDataReader {

@Test
public void testTimestampDataReader() throws IOException {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove all of the test prefixes in the newly added tests, since those aren't really adding any value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

- Tests for pre epoch
- remove the `test` prefix on the test methods
@jamespfaulkner

Copy link
Copy Markdown
Contributor Author

Thanks @nastra and @huaxingao.

I've updated to include pre epoch test cases.

and a test that exercises the new writer code paths
I've removed that writer now. We decided it wasn't needed but I'd forgotten to remove it previously.

@nastra nastra left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jamespfaulkner for fixing this and @huaxingao for reviewing

@nastra nastra merged commit 843bb46 into apache:main Nov 7, 2025
43 checks passed
@jamespfaulkner

Copy link
Copy Markdown
Contributor Author

Thanks @huaxingao and @nastra . I appreciate the time spent reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants