Build: Bump Parquet-Java to 1.16.0#13941
Conversation
66bd095 to
6309f60
Compare
| List<FileScanTask> files = | ||
| StreamSupport.stream(table.newScan().planFiles().spliterator(), false) | ||
| .collect(Collectors.toList()); | ||
| assertThat(files.size()).as("Did not have the expected number of files").isEqualTo(numExpected); |
There was a problem hiding this comment.
Nit: I thought below is the preferred assertion style with better error msg
assertThat(files).as("Did not have the expected number of files").hasSize(numExpected);
There was a problem hiding this comment.
I changed this to easily set a breakpoint and inspect files. I left it like that since it might be helpful in the future.
There was a problem hiding this comment.
ok. I will leave it up to you.
changing the assertion doesn't prevent the capability of setting a breakpoint and inspecting files, since the files are collected before the assertion, as new lines 2171-2173 stay as they are in this PR.
There was a problem hiding this comment.
List<String> list = Arrays.asList("a", "b", "c");
assertThat(list).hasSize(4);
Above code will fail with the following helpful error msg
Expected size: 4 but was: 3 in:
["a", "b", "c"]
1.16.0 not here yet https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-avro/ 1.16.0RC2 already has 3 binding votes, just not officially released yet |
…fd-test-parquet-1-16
| // added to Parquet | ||
| // Preconditions.checkArgument( | ||
| // sType instanceof VariantType, "Invalid variant: %s is not a VariantType", sType); | ||
| } else if (sType instanceof VariantType |
There was a problem hiding this comment.
why do we need to allow both?
There was a problem hiding this comment.
Ideally we don't need both, but if there is any data that has been written without the annotation, then we can fallback to the Iceberg schema. The important part is that we set the annotations in TypeToMessageType.java.
There was a problem hiding this comment.
but if there is any data that has been written without the annotation, then we can fallback to the Iceberg schema.
Do we want to support this scenario?
There was a problem hiding this comment.
I assume this is for us supporting any Variants written by Spark in 4.0, otherwise we couldn't possibly import them right?
There was a problem hiding this comment.
Ah, I didn't think of that use-case. Spark 4 was released before the annotation, so I think you're right there 👍
There was a problem hiding this comment.
I believe in the Iceberg Schema / Spark Type first and if the annotation is missing, I think we should still just read and be happy :)
There was a problem hiding this comment.
@Fokko can we also add some comment to explain why we are checking both? other non-primitive types (like list and map) only checks the annotation.
There was a problem hiding this comment.
Agree that we need to handle both cases (missing annotation but sType is variant and the logical type is variant) to support the existing data.
Also should we switch to fallback the old way as
LogicalTypeAnnotation.variantType(Variant.VARIANT_SPEC_VERSION).equals(annotation) || sType instanceof VariantType.
There was a problem hiding this comment.
Updated and added the comments 👍
f960d54 to
7a2bf7d
Compare
stevenzwu
left a comment
There was a problem hiding this comment.
LGTM. let' see how CI goes once Parquet binaries are released.
|
push an empty commit to trigger ci, it takes a while :) |
| // added to Parquet | ||
| // Preconditions.checkArgument( | ||
| // sType instanceof VariantType, "Invalid variant: %s is not a VariantType", sType); | ||
| } else if (sType instanceof VariantType |
There was a problem hiding this comment.
Agree that we need to handle both cases (missing annotation but sType is variant and the logical type is variant) to support the existing data.
Also should we switch to fallback the old way as
LogicalTypeAnnotation.variantType(Variant.VARIANT_SPEC_VERSION).equals(annotation) || sType instanceof VariantType.
|
thanks @Fokko for the contribution, and @aihuaxu @RussellSpitzer @nastra @kevinjqliu for the reviews |
Note that you can "re-run" failed tests from the github workloads ui too (if you are on the PMC) |
I think it should also be available for committers. |

Parquet 1.16.0 is out: https://lists.apache.org/thread/hs3tkc3k9vtcogq08yk7zl2p855voyvc