Flink: Pre-create fieldGetters to avoid constructing them for each row#10565
Conversation
|
Hi @nastra , the failing test cases seem unrelated to this change. Could you please re-trigger the tests? Thanks! |
|
@fengjiajie Rerunning the failed tests 👍 |
| */ | ||
| public static RowData clone( | ||
| RowData from, RowData reuse, RowType rowType, TypeSerializer[] fieldSerializers) { | ||
| RowData from, |
There was a problem hiding this comment.
Please create a deprecated version of the original public method to keep compatibility.
There was a problem hiding this comment.
Thanks for the reminder! I've added the original function below. The reason I didn't directly use the new function is that it requires FieldGetter for all fields, while the old function only needs to construct FieldGetter for fields with values. Constructing FieldGetter for all fields might negatively impact performance.
There was a problem hiding this comment.
I feel that performance degradation is acceptable for the deprecated methods if it is justified otherwise. The number of the duplicated lines seems high enough for me to try to avoid them.
The users could always do the caching of the getters themselves, or find another solution soon, since we remove the deprecated methods regularly.
WDYT?
Also CC-ed, @stevenzwu, who might be also interested in this PR
|
Thanks for the detailed explanation and the PR @fengjiajie! Left a few small comments. As a general rule we create a PR for a single version. Usually the latest. So if there are any review comments, they need to be fixed only once. |
@pvary Thanks for taking the time to review this! Please let me know if you have any suggestions. |
pvary
left a comment
There was a problem hiding this comment.
+1 from me. Let's wait a bit, to see if someone else is interested in reviewing. If there're no more comments, I will merge next Tuesday. (Ping me if I forgot 😄)
Thanks |
|
Thanks @fengjiajie for the improvement. |
apache#10565) * Flink: Pre-create fieldGetters to avoid constructing them for each row * modified as suggested * modified as suggested * modified as suggested
apache#10565) * Flink: Pre-create fieldGetters to avoid constructing them for each row * modified as suggested * modified as suggested * modified as suggested

I found that RowDataUtil.clone consumes a significant amount of CPU time by analyzing the flame graph. It constructs a FieldGetter object for each field of every row being processed. This commit optimizes this by constructing all FieldGetter objects upfront, avoiding repeated object creation.