Flink: support limit pushdown in FLIP-27 source by stevenzwu · Pull Request #10748 · apache/iceberg

stevenzwu · 2024-07-22T17:36:10Z

No description provided.

czy006

LGTM

nastra · 2024-07-23T12:18:01Z

+    this.counter = new AtomicLong(0);
+  }
+
+  public boolean reachLimit() {


Suggested change

public boolean reachLimit() {

public boolean reachedLimit() {

nastra · 2024-07-23T12:21:39Z

+  private final AtomicLong counter;
+
+  private RecordLimiter(long limit) {
+    Preconditions.checkArgument(limit > 0, "Invalid limit: not a positive number");


Suggested change

Preconditions.checkArgument(limit > 0, "Invalid limit: not a positive number");

Preconditions.checkArgument(limit > 0, "Invalid limit: %s must a positive number", limit);

nastra · 2024-07-23T12:24:02Z

+import org.apache.iceberg.flink.source.FileScanTaskReader;
+import org.apache.iceberg.io.FileIO;
+
+class LimitableDataIterator<T> extends DataIterator<T> {


it might make sense to add a small unit test to make sure this works as expected

nastra · 2024-07-23T12:25:48Z

+
+  @Override
+  public T next() {
+    if (limiter != null) {


maybe instead of having null checks everywhere we could have a NOOP limiter that just wouldn't do anything, wdyt? That way we wouldn't need null checks everywhere and you'd end up using either the Noop limiter or a normal one with a valid limit

I have moved the check of the long limit value (non-positive for unlimited) inside the RecordLimiter. I tried to avoid another class of NoopRecordLimiter. please see if this is inline with what you are thinking.

pvary · 2024-07-24T05:31:51Z

  @Override
  public DataIterator<RowData> createDataIterator(IcebergSourceSplit split) {
-    return new DataIterator<>(
+    return new LimitableDataIterator<>(


Is this limit applied after the residual filters?

limit is applied after the residual filters. Residual filters are applied inside RowDataFileScanTaskReader, which is used and wrapped by the DataIterator

…e/reader/LimitableDataIterator.java Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>

pvary · 2024-07-26T13:23:12Z

+    // Note that this query doesn't have the limit clause in the SQL.
+    // This assertions works because limit is pushed down to the reader and
+    // reader parallelism is 1.


Is the limit applied for every reader? Would it mean that if we have 4 readers, and 4 splits, then we will have 4 records in the result instead of 1?

That is correct. That is exactly what I am trying to clarify here, because the SQL query has no limit clause. if the source parallelism is 4, there could be 4 readers and each may emit 1 record. Note that limit pushdown is not guarantee that source only emit the limited number of record. Source only needs to try its best to break/stop early. The SQL limit clause and the SQL engine does the final result limit.

stevenzwu · 2024-07-29T22:35:43Z

thanks @czy006 @nastra @pvary for the review

(cherry picked from commit f758593)

(cherry picked from commit 72b39ab)

(cherry picked from commit f758593)

(cherry picked from commit 72b39ab)

stevenzwu requested review from nastra and pvary July 22, 2024 17:36

github-actions Bot added the flink label Jul 22, 2024

stevenzwu force-pushed the flip27-source-limit-pushdown branch from fb1cc36 to f8a67bc Compare July 22, 2024 17:59

Flink: support limit pushdown in FLIP-27 source

2061332

stevenzwu force-pushed the flip27-source-limit-pushdown branch from f8a67bc to 2061332 Compare July 22, 2024 18:44

czy006 approved these changes Jul 23, 2024

View reviewed changes

nastra reviewed Jul 23, 2024

View reviewed changes

stevenzwu added 2 commits July 23, 2024 09:23

address Eduard's comments

08673fd

Further clarify the comment in the unit test

a84d108

pvary reviewed Jul 24, 2024

View reviewed changes

nastra reviewed Jul 25, 2024

View reviewed changes

Comment thread .../v1.19/flink/src/main/java/org/apache/iceberg/flink/source/reader/LimitableDataIterator.java Outdated

nastra approved these changes Jul 25, 2024

View reviewed changes

Update flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sourc…

02010bc

…e/reader/LimitableDataIterator.java Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>

pvary reviewed Jul 26, 2024

View reviewed changes

pvary approved these changes Jul 29, 2024

View reviewed changes

stevenzwu merged commit f758593 into apache:main Jul 29, 2024

stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Jul 29, 2024

Flink: backport PR apache#10748 for limit pushdown

803958a

stevenzwu deleted the flip27-source-limit-pushdown branch July 30, 2024 02:41

stevenzwu added a commit that referenced this pull request Jul 30, 2024

Flink: backport PR #10748 for limit pushdown (#10813)

72b39ab

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Flink: support limit pushdown in FLIP-27 source (apache#10748)

70b0aa6

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Flink: backport PR apache#10748 for limit pushdown (apache#10813)

146ea29

czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025

Flink: support limit pushdown in FLIP-27 source (apache#10748)

1a19122

(cherry picked from commit f758593)

czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025

Flink: backport PR apache#10748 for limit pushdown (apache#10813)

3ae6209

(cherry picked from commit 72b39ab)

czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025

Flink: support limit pushdown in FLIP-27 source (apache#10748)

547e6cd

(cherry picked from commit f758593)

czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025

Flink: backport PR apache#10748 for limit pushdown (apache#10813)

0add36c

(cherry picked from commit 72b39ab)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flink: support limit pushdown in FLIP-27 source#10748

Flink: support limit pushdown in FLIP-27 source#10748
stevenzwu merged 4 commits into
apache:mainfrom
stevenzwu:flip27-source-limit-pushdown

stevenzwu commented Jul 22, 2024

Uh oh!

czy006 left a comment

Uh oh!

nastra Jul 23, 2024

Uh oh!

nastra Jul 23, 2024

Uh oh!

nastra Jul 23, 2024

Uh oh!

nastra Jul 23, 2024

Uh oh!

stevenzwu Jul 23, 2024 •

edited

Loading

Uh oh!

pvary Jul 24, 2024

Uh oh!

stevenzwu Jul 24, 2024

Uh oh!

Uh oh!

pvary Jul 26, 2024

Uh oh!

stevenzwu Jul 26, 2024 •

edited

Loading

Uh oh!

stevenzwu commented Jul 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	public boolean reachLimit() {
	public boolean reachedLimit() {

	Preconditions.checkArgument(limit > 0, "Invalid limit: not a positive number");
	Preconditions.checkArgument(limit > 0, "Invalid limit: %s must a positive number", limit);

Uh oh!

Conversation

stevenzwu commented Jul 22, 2024

Uh oh!

czy006 left a comment

Choose a reason for hiding this comment

Uh oh!

nastra Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

nastra Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

nastra Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

nastra Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

stevenzwu Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pvary Jul 24, 2024

Choose a reason for hiding this comment

Uh oh!

stevenzwu Jul 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pvary Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

stevenzwu Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Jul 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevenzwu Jul 23, 2024 •

edited

Loading

stevenzwu Jul 26, 2024 •

edited

Loading