GCP: Add scheduled refresh for storage credentials held by GCSFileIO#15696
Conversation
9574bba to
09f1ad0
Compare
singhpk234
left a comment
There was a problem hiding this comment.
The changes LGTM !
Have some optional suggestions, when seeing things from fresh set of eyes !
| } | ||
|
|
||
| try (OAuth2RefreshCredentialsHandler handler = | ||
| OAuth2RefreshCredentialsHandler.create(properties)) { |
There was a problem hiding this comment.
creating a OAuth2RefreshCredentialsHandler per refresh call would be expensive, it creates a http client and an AuthManager additionally, i wonder if we can just have this done one (invalidated / created again if props change) and just reuse that across refreshes ?
There was a problem hiding this comment.
I don't feel like this is too expensive. Typically, we're talking about possibly one invocation per hour (most queries will likely have none because the run within the timeout), so holding all of the http client thread pools and other resources is the more wasteful approach.
There was a problem hiding this comment.
Typically, we're talking about possibly one invocation per hour (most queries will likely have none because the run within the timeout)
If we create the handler lazily, we can achieve the same.
holding all of the http client thread pools and other resources is the more wasteful approach
should we trim down the http client pool to be way less than ~100 in the first place ? we technically don't need the whole HTTP Client pool (~100) for OAuth2RefreshCredsHandler (though i think pools connections might be lazily made) ? its gonna be mostly call to rest server by picking a connection from the pool .
My main suggestion was we are creating burst of connections / threads which we are discarding anyways after the credentials call completes, hence i though it was an expensive
09f1ad0 to
5b5ae17
Compare
…pache#15696) GCP: Add scheduled refresh for storage credentials held by GCSFileIO
The
GCSFileIOimplementation never refreshes the credentials that are held directly from the table load. The GCS storage clients use aOAuth2RefreshCredentialsHandlerthat internally refreshes the storage client, but those updates are not reflected back to the FileIO.If an
GCSFileIOinstance is serialized to remote workers, the credentials may be expired triggering a thundering herd of requests to refresh immediately.This change addresses this problem by proactively updating the credentials in the FileIO so that only valid credentials are propagated to remote clients.
This applies the same changes as was already done for S3FileIO in #15678