fix: serve still-valid cached AWS credentials when refresh fails by lancedb-robot · Pull Request #7506 · lance-format/lance

lancedb-robot · 2026-06-27T22:12:49Z

Problem

Production query nodes intermittently fail S3/DynamoDB operations with:

Failed to get AWS credentials: ... DispatchFailure { source: ConnectorError { kind: Timeout, source: HttpTimeoutError { kind: "HTTP connect", duration: 3.1s } } }

which surfaces to callers as a 500.

AwsCredentialAdapter (lance-io/src/object_store/providers/aws.rs) proactively refreshes credentials credentials_refresh_offset (default 60s) before they expire. During that window the cached credentials are still valid, but the adapter treated the cache as a miss and performed a blocking refresh. When that refresh hit a transient error from the underlying provider — e.g. an IMDS/STS HTTP connect timeout — get_credential discarded the still-valid cached credentials and returned a hard error.

The same adapter backs both the S3 object store and the DynamoDB external manifest store (via OSObjectStoreToAwsCredAdaptor), so a single transient credential-provider blip during the refresh window turns into a failed request.

This is the gap left by the earlier credential-caching/refresh-offset work: hard expiry is handled, but a failed proactive refresh was not.

Fix

When a refresh fails but the cached credentials have not actually expired yet, fall back to the cached credentials and log a warning; the next call retries the refresh. Truly-expired credentials still surface the error rather than being served.

Added a unit test (test_aws_credential_adapter_falls_back_to_cached_on_refresh_failure) using a provider that succeeds once and then fails, asserting that the still-valid cached credentials are served while valid, and that an error is returned once they expire.

Notes

This addresses transient failures during the refresh window. A cold-start credential fetch (empty cache) against an unreachable IMDS/STS endpoint will still fail, as there is nothing valid to fall back to.
cargo fmt/cargo clippy could not be run in this environment (the pinned toolchain's rustfmt/clippy components are unavailable offline). The change was kept formatting-clean by hand; please confirm in CI.

AwsCredentialAdapter proactively refreshes credentials credentials_refresh_offset (default 60s) before they expire. When that proactive refresh hit a transient error from the underlying provider (e.g. an IMDS/STS HTTP connect timeout), get_credential discarded the still-valid cached credentials and returned a hard error, surfacing as a 500 for S3 and DynamoDB operations. Fall back to the cached credentials when a refresh fails but the cached credentials have not actually expired yet; the next call retries the refresh. Truly-expired credentials still surface the error rather than being used.

Katomoto

Logic looks correct. One suggestion: the early return on line prevents the cleanup function from running — worth adding a finally block.

codecov · 2026-06-27T22:52:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions Bot added A-encoding Encoding, IO, file reader/writer bug Something isn't working labels Jun 27, 2026

Katomoto reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: serve still-valid cached AWS credentials when refresh fails#7506

fix: serve still-valid cached AWS credentials when refresh fails#7506
lancedb-robot wants to merge 1 commit into
lance-format:mainfrom
lancedb-robot:backburner/aws-cred-refresh-fallback

lancedb-robot commented Jun 27, 2026

Uh oh!

Katomoto left a comment

Uh oh!

codecov Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lancedb-robot commented Jun 27, 2026

Problem

Fix

Notes

Uh oh!

Katomoto left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 27, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants