Spark 4.0: Row Lineage support#13310
Conversation
7dcb5a2 to
be78ef5
Compare
54145af to
3a6d7fa
Compare
| public void beforeEach() { | ||
| assumeThat(formatVersion).isGreaterThanOrEqualTo(3); | ||
| // ToDo: Remove these as row lineage inheritance gets implemented in the other readers | ||
| assumeThat(fileFormat).isEqualTo(FileFormat.PARQUET); |
There was a problem hiding this comment.
maybe worth overriding parameters() in TestRowLevelOperationsWithLineage and defining a smaller test matrix, wdyt?
There was a problem hiding this comment.
Yup agreed! I need to rebase and incorporate hte latest test changes I made which define a smaller test matrix (and will also remove the changes I made to SparkRowLevelOperationsTestBase)
| public NamedReference[] requiredMetadataAttributes() { | ||
| NamedReference specId = Expressions.column(MetadataColumns.SPEC_ID.name()); | ||
| NamedReference partition = Expressions.column(MetadataColumns.PARTITION_COLUMN_NAME); | ||
| if (TableUtil.supportsRowLineage(table)) { |
There was a problem hiding this comment.
nit: I'm fine either way but I think it would be could to align how this is done here (stores named references in an array) vs in SparkCopyOnWriteOperation (which stores named references in a list)
| .writeProperties(writeProperties) | ||
| .build(); | ||
|
|
||
| Function<InternalRow, InternalRow> extractRowLineage = |
There was a problem hiding this comment.
nit: maybe rowLineageExtractor or something along those lines? I only mention this because extractRowLineage sounds like a boolean flag
b5c5dd6 to
0d51fb5
Compare
dad59a1 to
31cfce2
Compare
…stently for surfacing metadata columns, and include test refactorings that were done in 3.4/3.5
…isting metadata row
31cfce2 to
2eb84ca
Compare
…mMetadata to RowLineageExtractor
…ow lineage decoration
This change implements spark 4.0 with Iceberg v3's row lineage feature; this approach uses the new conditional nullification mechanism introduced in 4.0 instead of custom rules that we implemented for 3.5