Avro: Add variant readers and writers#12457
Conversation
9aad1b4 to
b18a880
Compare
| case ARRAY: | ||
| // for now, generate an object instead of an array | ||
| case OBJECT: | ||
| ShreddedObject object = Variants.object(metadata); |
There was a problem hiding this comment.
Seems we only have fully shredded objects here. Should we add partial shredded objects?
There was a problem hiding this comment.
This produces a variant in memory, not a serialized variant. The writer is free to serialize however it chooses, although this is currently not intended to exercise shredding paths that need records to have a more stable structure in order to make sense.
| @Override | ||
| public void write(T datum, Encoder encoder) throws IOException { | ||
| if (datum instanceof Serialized) { | ||
| encoder.writeBytes(((Serialized) datum).buffer()); |
There was a problem hiding this comment.
I assume this is why we're making Serialized public? I was looking to see if there were some other way, but this looks as good as anything.
There was a problem hiding this comment.
Yeah, that's correct. It seems reasonable to me to avoid the copy.
|
|
||
| @Override | ||
| public Variant fromRecord(IndexedRecord record, Schema schema, LogicalType type) { | ||
| int metadataPos = schema.getField("metadata").pos(); |
There was a problem hiding this comment.
nit: since we're using the field names in multiple places, it would be nice to pull them into a static member and not use a literal (e.g. VARIENT_METADATA_FIELD and VARIANT_VALUE_FIELD)
| public class RandomVariants { | ||
| private RandomVariants() {} | ||
|
|
||
| public static Variant randomVariant(Random random) { |
There was a problem hiding this comment.
A little bit of docs around the expected structure and what this produces would be nice for future testing. It takes a bit of visual inspection to understand the shape of what is returned by this method.
At some point it might be nice to enhance this so you can provide a size so that it will keep generating values until you reach a certain size. More importantly, it might be nice to take an Iceberg schema since that will be important for shredding.
There was a problem hiding this comment.
Yeah, I think when we add fuzzing tests for shredding, we'll want to pass in an Iceberg schema like you said.
danielcweeks
left a comment
There was a problem hiding this comment.
A couple comments/suggestions, but +1
|
Thanks for reviewing, @danielcweeks and @aihuaxu! I'll follow up on some of Dan's comments in later PRs. I'd like to get this one in so I can open PRs for other readers that will use the |
This adds readers and writers for Variants in Avro. Avro generics, Iceberg generics, and the internal data model are supported.
Variant values are tested using
RandomVariantsto extend random data generation.