Skip to content

Core: Add table property for ORC batch size#3133

Merged
rdblue merged 1 commit into
apache:masterfrom
aokolnychyi:orc-batch-size-conf
Sep 17, 2021
Merged

Core: Add table property for ORC batch size#3133
rdblue merged 1 commit into
apache:masterfrom
aokolnychyi:orc-batch-size-conf

Conversation

@aokolnychyi

Copy link
Copy Markdown
Contributor

This PR adds a new table property to control the ORC batch size. Previously, we used the Parquet batch size for ORC reads.

}
}

private int batchSize(FileFormat fileFormat) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a temporary solution until we consume #3132 that will put this logic in one place for Spark 2 and 3.

@aokolnychyi

Copy link
Copy Markdown
Contributor Author

@aokolnychyi

Copy link
Copy Markdown
Contributor Author

Do we have to change anything in the Hive integration, @szlta @pvary?

private List<Expression> filterExpressions = null;
private Filter[] pushedFilters = NO_FILTERS;
private final boolean localityPreferred;
private final int batchSize;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this set before? Was it just ignored?

@rdblue rdblue merged commit 604fd28 into apache:master Sep 17, 2021
@rdblue

rdblue commented Sep 17, 2021

Copy link
Copy Markdown
Contributor

Thanks, @aokolnychyi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants