Skip to content

gh-151613: Fix remote debugging frame cache ABA#151614

Merged
pablogsal merged 2 commits into
python:mainfrom
pablogsal:gh-151613-remote-debugging-frame-cache-aba
Jun 27, 2026
Merged

gh-151613: Fix remote debugging frame cache ABA#151614
pablogsal merged 2 commits into
python:mainfrom
pablogsal:gh-151613-remote-debugging-frame-cache-aba

Conversation

@pablogsal

@pablogsal pablogsal commented Jun 17, 2026

Copy link
Copy Markdown
Member

Fixes #151613.

The remote debugging frame cache previously used only the last_profiled_frame address as its cache anchor. If a frame returned and a later frame reused the same _PyInterpreterFrame address, the profiler could accept a stale cache entry and splice parent frames from a different call chain into the current stack.

This adds a last_profiled_frame_seq counter next to last_profiled_frame, increments it when the anchor advances, stores it in frame cache entries, and validates cache hits against both the frame address and the sequence. Cache miss walks now copy stack chunks before storing new cache entries so stored continuations come from a stable snapshot. The new regression test exercises alternating call chains and checks that cached stacks never contain frames from both branches.

@pablogsal pablogsal force-pushed the gh-151613-remote-debugging-frame-cache-aba branch from 000eedb to 9f447d8 Compare June 17, 2026 21:25
@pablogsal pablogsal marked this pull request as ready for review June 17, 2026 21:31
@pablogsal pablogsal force-pushed the gh-151613-remote-debugging-frame-cache-aba branch 2 times, most recently from 418a947 to a548b24 Compare June 17, 2026 21:46
Fixes python#151613.

The remote debugging frame cache previously used only the last_profiled_frame address as its cache anchor. If a frame returned and a later frame reused the same _PyInterpreterFrame address, the profiler could accept a stale cache entry and splice parent frames from a different call chain into the current stack.

This adds a last_profiled_frame_seq counter next to last_profiled_frame, increments it when the anchor advances, stores it in frame cache entries, and validates cache hits against both the frame address and the sequence. Cache miss walks now copy stack chunks before storing new cache entries so stored continuations come from a stable snapshot. The new regression test exercises alternating call chains and checks that cached stacks never contain frames from both branches.
Comment thread Include/internal/pycore_interpframe.h
Comment thread Lib/test/test_external_inspection.py Outdated
@pablogsal

Copy link
Copy Markdown
Member Author

Discussed this a bit with @maurycy offline.

I’m going to keep this PR as the small ABA/cache-anchor fix. The sequence here is tied to last_profiled_frame: it is there to tell “same _PyInterpreterFrame address, but not the same anchored frame anymore”.

The epoch idea where we bump on any pop while profiling is active is useful to explore, but it changes the meaning of the counter and makes the cache more conservative. I think that should be a follow-up with the Oracle data and perf/cache-hit numbers.

I’m also dropping the new test from Lib/test. It’s a good witness, but it depends on timing and address reuse, so it fits better in the Oracle/stress harness than as a buildbot pass/fail test.

@pablogsal pablogsal enabled auto-merge (squash) June 27, 2026 16:29
@pablogsal pablogsal merged commit 8cda6ae into python:main Jun 27, 2026
54 checks passed
@pablogsal pablogsal deleted the gh-151613-remote-debugging-frame-cache-aba branch June 27, 2026 16:56
@pablogsal pablogsal added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jun 27, 2026
@miss-islington-app

Copy link
Copy Markdown

Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.15.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@miss-islington-app

Copy link
Copy Markdown

Sorry, @pablogsal, I could not cleanly backport this to 3.15 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 8cda6ae2f1f86f2d26c29586ffc9687b410abfcf 3.15

pablogsal added a commit to pablogsal/cpython that referenced this pull request Jun 27, 2026
The remote debugging frame cache previously used only the last_profiled_frame address as its cache anchor. If a frame returned and a later frame reused the same _PyInterpreterFrame address, the profiler could accept a stale cache entry and splice parent frames from a different call chain into the current stack.

This adds a last_profiled_frame_seq counter next to last_profiled_frame, increments it when the anchor advances, stores it in frame cache entries, and validates cache hits against both the frame address and the sequence. Cache miss walks now copy stack chunks before storing new cache entries so stored continuations come from a stable snapshot. The new regression test exercises alternating call chains and checks that cached stacks never contain frames from both branches.

(cherry picked from commit 8cda6ae)
pablogsal added a commit that referenced this pull request Jun 27, 2026
gh-151613: Fix remote debugging frame cache ABA (#151614)

The remote debugging frame cache previously used only the last_profiled_frame address as its cache anchor. If a frame returned and a later frame reused the same _PyInterpreterFrame address, the profiler could accept a stale cache entry and splice parent frames from a different call chain into the current stack.

This adds a last_profiled_frame_seq counter next to last_profiled_frame, increments it when the anchor advances, stores it in frame cache entries, and validates cache hits against both the frame address and the sequence. Cache miss walks now copy stack chunks before storing new cache entries so stored continuations come from a stable snapshot. The new regression test exercises alternating call chains and checks that cached stacks never contain frames from both branches.

(cherry picked from commit 8cda6ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs backport to 3.15 pre-release feature fixes, bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_remote_debugging frame cache can reuse stale frame anchors

2 participants