Core: Avro logical timestamp-millis support#14401
Conversation
…avro-logicla-timestamp-ms
| return ChronoUnit.NANOS.addTo(EPOCH, nanosFromEpoch).toLocalDateTime(); | ||
| } | ||
|
|
||
| public static LocalDateTime timestampFromMillis(long millisFromEpoch) { |
There was a problem hiding this comment.
please add tests for anything that's new to TestDateTimeUtil
| } | ||
| return GenericWriters.timestampNanos(); | ||
|
|
||
| case "timestamp-millis": |
There was a problem hiding this comment.
there should be a test that exercises this path
There was a problem hiding this comment.
So I created a test within TestAvroDataWriter that exercised both of the code paths under "timestamp-millis". However, upon further investigation, the files written through Iceberg are either going to use the logical type timestamp-micros or timestamp-nanos (corresponding to TimestampType and TimestampNanoType. I dont think we want to introduce another type.
I'm therefore thinking this change isn't required in the DataWriter. It would introduce inconsistencies between avro.DataWriter and avro.DataReader, but this PR is really only required to address issues with avro data written outside of iceberg.
I'm therefore thinking it's better to remove the changes from DataWriter.
WDYT?
There was a problem hiding this comment.
yes indeed we should be removing "timestamp-millis" from the writers. Adding the possibility to read timestamp millis is fine, but we don't want to write it from Iceberg
|
@nastra Thanks for the review |
| } | ||
| return GenericReaders.timestampNanos(); | ||
|
|
||
| case "timestamp-millis": |
There was a problem hiding this comment.
we also need a test for this code path
There was a problem hiding this comment.
Added the same tests we had for PlannedDataReader. Also added tests for the TZ code path too.
- Removed timestamp-millis from Writers - Added tests for DataWriter and tz tests
|
@nastra Thanks for the feedback/followup review 👍 |
|
Thanks @jamespfaulkner for the PR! It looks good to me overall. Could we also add a test for a pre-epoch timestamp case (e.g., 1965-01-01 10:11:12.123) and a test that exercises the new writer code paths? Thanks! |
| return TimestamptzNanoWriter.INSTANCE; | ||
| } | ||
|
|
||
| static ValueWriter<LocalDateTime> timestampMillis() { |
There was a problem hiding this comment.
changes to this file can be reverted, since this isn't used anywhere
| class TestDataReader { | ||
|
|
||
| @Test | ||
| public void testTimestampDataReader() throws IOException { |
There was a problem hiding this comment.
nit: please remove all of the test prefixes in the newly added tests, since those aren't really adding any value
- Tests for pre epoch - remove the `test` prefix on the test methods
|
Thanks @nastra and @huaxingao. I've updated to include pre epoch test cases.
|
nastra
left a comment
There was a problem hiding this comment.
LGTM, thanks @jamespfaulkner for fixing this and @huaxingao for reviewing
|
Thanks @huaxingao and @nastra . I appreciate the time spent reviewing! |
Hey folks,
This PR addresses the issue #12395
I've contributed a slightly modified version of the original PR addressing this issue.
The differences are:
PlannedDataReaderto account for the deprecation ofDataReadertimestamp-millisis handled independently oftimestamp-microsto account for data files written outside of iceberg.I tested this locally on a small avro based hive table that was migrated to iceberg and queried via a
IcebergGenerics:readscan:IllegalArgumentException: Unknown logical type: org.apache.hive.iceberg.org.apache.avro.LogicalTypes$TimestampMillisissue