Core: Fix for respecting custom location providers in SerializableTable #12564 #14280
Conversation
- Add missing .hasMessage() checks to assertThatThrownBy calls - Ensures compliance with AssertThatThrownByWithMessageCheck rule
- Add .hasMessage() check to multi-line assertThatThrownBy call - Ensures full compliance with AssertThatThrownByWithMessageCheck rule
There was a problem hiding this comment.
Thanks for the change @przemekd.
reading the issue it seems like this has been impacting multiple users, we have a 1.10.1 window open can you please post it in iceberg mailing thread for 1.10.1
https://lists.apache.org/thread/9tjs060vzs6nnghk2brcw0hv89h4drp0 for visibility.
| * | ||
| * @param <T> the type of the result | ||
| */ | ||
| public class Try<T> implements Serializable { |
There was a problem hiding this comment.
Have you considered vavr - https://github.com/vavr-io/vavr/tree/main/vavr it has similar utility, i think you use its getUnchecked and its apache licenced.
we might need some other community members to weigh in here too, my observation is in past, on an orthogonal context, we preferred a lib like failsafe rather than wriring something on our own.
There was a problem hiding this comment.
I feel like we can be even simpler to begin with here. I poked around locally, and I do agree we need some abstraction almost like a Try monad which encapsulates either a result of some kind or a failure. I originally was going to see if we could just use an abstraction over Tasks but ultimately, we need something to capture the state that it was a failure, not just handle the failure. We could use a separate AtomicReference and a boolean for indicating that a location provider could not be resolved, but that's pretty unclean.
That said, I don't think we need a full blown public class with all these APIs, at least as of now.
I think we could just have a package private class like the following with the main write API being of(SerializableSupplier<T> supplier) and the main read API being the getOrThrow.
class Try<T> implements Serializable {
public static <T> of(SerializableSupplier<T> supplier) {//does what capture does}
public static T getOrThrow()....
}
I think this will remove the need for a separate test suite just for this class. Until we know we'll need this for a wider set of use cases, I think we should just keep this as minimal as possible.
There was a problem hiding this comment.
@singhpk234 @amogh-jahagirdar I looked at Vavr, but for such limited use I chose not to add it. I pared the class down; it now catches Throwable and rethrows using sneakyThrow, following the same pattern as Vavr.
|
|
||
| // The exception should be thrown when locationProvider() is actually called | ||
| assertThatThrownBy(serializableTable::locationProvider).isSameAs(failure); | ||
|
|
There was a problem hiding this comment.
Should we add a test for round trip serialization to ensure the behavior is the same?
TestHelpers.KryoHelpers.roundTripSerialize(serializableTable)
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks @przemekd , I agree with the general idea behind the fix, I just think we could have a simpler, and hidden Try construct to begin with.
| * | ||
| * @param <T> the type of the result | ||
| */ | ||
| public class Try<T> implements Serializable { |
There was a problem hiding this comment.
I feel like we can be even simpler to begin with here. I poked around locally, and I do agree we need some abstraction almost like a Try monad which encapsulates either a result of some kind or a failure. I originally was going to see if we could just use an abstraction over Tasks but ultimately, we need something to capture the state that it was a failure, not just handle the failure. We could use a separate AtomicReference and a boolean for indicating that a location provider could not be resolved, but that's pretty unclean.
That said, I don't think we need a full blown public class with all these APIs, at least as of now.
I think we could just have a package private class like the following with the main write API being of(SerializableSupplier<T> supplier) and the main read API being the getOrThrow.
class Try<T> implements Serializable {
public static <T> of(SerializableSupplier<T> supplier) {//does what capture does}
public static T getOrThrow()....
}
I think this will remove the need for a separate test suite just for this class. Until we know we'll need this for a wider set of use cases, I think we should just keep this as minimal as possible.
| public static <T> Try<T> capture(ThrowingSupplier<T> supplier) { | ||
| try { | ||
| return success(supplier.get()); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
I'd handle Throwable for this particular case
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thank you @przemekd . I think it mostly looks great, just had a comment that I think we should separate out the kryo assertions into a separate test method.
| Table kryoDeserialized = KryoHelpers.roundTripSerialize(serializableTable); | ||
| assertThatThrownBy(kryoDeserialized::locationProvider) | ||
| .isInstanceOf(RuntimeException.class) | ||
| .hasMessage("location provider failure"); |
There was a problem hiding this comment.
Could we have kryo test in a separate test? In case something breaks in the future it's easier to isolate if it's a kryo issue or not instead of having to poke into the tests.
There was a problem hiding this comment.
Sure thing, it makes total sense 👍🏻
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Great thank you @przemekd , looks good from my side. I'll leave this up for a bit for @singhpk234 @geruh to take another look
geruh
left a comment
There was a problem hiding this comment.
LGTM! the Try class now effectively defers the location provider outcomes during serialization. Internal testing of this workflow has been successful!
|
Thank you @przemekd and @geruh @singhpk234 for reviewing. I'll go ahead and merge. This is something that should be cherry-picked onto the 1.10.x branch |
…ble apache#12564 (apache#14280) (cherry picked from commit e667f64)
Closes #11527 in a way that does not break fixes provided by changes needed to fix #9029.
It eagerly calculates the location provider field, but wraps that into a newly implemented wrapper that only throws an error if code really needs to use that location provider. In case table needs to be read only by other processes the error is not thrown, even if there is an error with getting the location provider object.