Skip to content

Core: Optimize DeleteFileIndex#8157

Merged
aokolnychyi merged 1 commit into
apache:masterfrom
aokolnychyi:improve-delete-index
Jul 27, 2023
Merged

Core: Optimize DeleteFileIndex#8157
aokolnychyi merged 1 commit into
apache:masterfrom
aokolnychyi:improve-delete-index

Conversation

@aokolnychyi

@aokolnychyi aokolnychyi commented Jul 26, 2023

Copy link
Copy Markdown
Contributor

This PR improves and refactors DeleteFileIndex.

  • Avoid the cost of repeated conversion of min/max boundaries that damages the index lookup performance.
  • Use dataSequenceNumber from ContentFile instead of ManifestEntry to support distributed planning in the future.

This change relies on existing tests and adds a new benchmark.

Results prior to this change:

Benchmark                                                    Mode  Cnt  Score   Error  Units
PlanningBenchmark.localPlanningWithMinMaxFilter                ss    5  9.740 ± 2.333   s/op
PlanningBenchmark.localPlanningWithPartitionAndMinMaxFilter    ss    5  3.008 ± 0.044   s/op
PlanningBenchmark.localPlanningWithoutFilter                   ss    5  9.569 ± 1.309   s/op

Results after this change:

Benchmark                                                    Mode  Cnt  Score   Error  Units
PlanningBenchmark.localPlanningWithMinMaxFilter                ss    5  5.618 ± 1.297   s/op
PlanningBenchmark.localPlanningWithPartitionAndMinMaxFilter    ss    5  2.668 ± 0.210   s/op
PlanningBenchmark.localPlanningWithoutFilter                   ss    5  5.699 ± 0.595   s/op

This would be even more important for equality deletes.


// a delete file wrapper that caches the converted boundaries for faster boundary checks
// this class is not meant to be exposed beyond the delete file index
private static class IndexedDeleteFile {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how I found the issue:
image


DeleteFileIndex(
Map<Integer, PartitionSpec> specsById,
Map<Integer, PartitionSpec> specs,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed specsById to stay on one line below.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about style like this later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it is OK to do given that this PR already contains a few cosmetic changes.

}

DeleteFile[] forDataFile(long sequenceNumber, DataFile file) {
if (isEmpty) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to derive the partition and do anything with the streams if the index is empty.

Comment thread core/src/main/java/org/apache/iceberg/DeleteFileIndex.java Outdated
// read all of the matching delete manifests in parallel and accumulate the matching files in
// a queue
Queue<ManifestEntry<DeleteFile>> deleteEntries = new ConcurrentLinkedQueue<>();
Queue<DeleteFile> files = new ConcurrentLinkedQueue<>();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need this change for distributed planning as well.


private static <T> boolean rangesOverlap(
Type.PrimitiveType type,
Types.NestedField field,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing field to stay on a single line when calling this method.

return Arrays.stream(files, start, files.length);
}

private static DeleteFileGroup index(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These index methods only exist because the old constructor is package private and is used in tests. I had to keep that for compatibility.

Comment thread core/src/main/java/org/apache/iceberg/DeleteFileIndex.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/DeleteFileIndex.java
if (convertedLowerBounds == null) {
synchronized (this) {
if (convertedLowerBounds == null) {
this.convertedLowerBounds = convertBounds(wrapped.lowerBounds());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here to convert all bounds at once to avoid a null check in lowerBound?

If so, I don't think this is a good idea. The delete index is only going to use a few bound values (overlap uses only the equality delete's equality field ID set), so converting all of them is probably unnecessary. Plus, calling lowerBounds() in lowerBound(int) already incurs a null check to see if the bounds are converted. So lazily converting each lower bound is probably no more cost.

@aokolnychyi aokolnychyi Jul 26, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying method convertBounds only converts file path for position deletes and equality field ids for equality deletes. You are right, they are converted at the same time. I did that to avoid using a concurrent hash map and computeIfAbsent (which has performance issues).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Sounds good then.

@aokolnychyi aokolnychyi Jul 26, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still debating. We probably need a concurrent hash map to load each value one by one, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the problem I was talking about: https://bugs.openjdk.org/browse/JDK-8161372

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the problem I was talking about: https://bugs.openjdk.org/browse/JDK-8161372

This should help then: apache/uniffle#766

@aokolnychyi aokolnychyi Jul 27, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a bit more time thinking about this and I don't think it would be worth the extra complexity. Let's keep this as is for now. We only index equality IDs and all columns must be checked to discard a file.

I also checked Caffeine caches and they have some workarounds but I don't think we need them here.

Comment thread core/src/main/java/org/apache/iceberg/DeleteFileIndex.java Outdated

} else {
for (int id : equalityFieldIds()) {
Type type = spec.schema().findField(id).type();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and spec.schema() is used in both conversions so the type should always match.

@rdblue rdblue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct overall. I had some comments about mostly minor things.

@zinking

zinking commented Jul 27, 2023

Copy link
Copy Markdown
Contributor

@rdblue @aokolnychyi you might be interested in revisit this PR again #5760 .
as it helps further reduce timing of canContainDeleteFile

T deleteUpper) {
Type.PrimitiveType type = field.type().asPrimitiveType();
Comparator<T> cmp = Comparators.forType(type);
T dataLower = Conversions.fromByteBuffer(type, dataLowerBuf);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the most performance improvement comes here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.


return comparator.compare(deleteLower, dataUpper) <= 0
&& comparator.compare(dataLower, deleteUpper) <= 0;
return cmp.compare(deleteLower, dataUpper) <= 0 && cmp.compare(dataLower, deleteUpper) <= 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, maybe we can avoid the conversion for the upper. But this is a little improvement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to match the checks we have for the file path.

@aokolnychyi aokolnychyi force-pushed the improve-delete-index branch from 745d9e3 to 32a3d1b Compare July 27, 2023 04:47
@aokolnychyi

Copy link
Copy Markdown
Contributor Author

@zinking, we have discussed it a bit during one of the community syncs. I'll take another look over the weekend.

@aokolnychyi

aokolnychyi commented Jul 27, 2023

Copy link
Copy Markdown
Contributor Author

I'll merge this one to unblock the distributed planning effort. Thanks everyone for reviewing!

@aokolnychyi aokolnychyi merged commit 9cc9a5d into apache:master Jul 27, 2023
zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants