Skip to content

OrderedDict: iterator constructor odictiter_new is unlocked under free-threading -- use-after-free SIGSEGV (GH-133734 follow-up to gh-125996) #152537

Description

@robertsdotpm

Crash report

What happened?

OrderedDict: iterator constructor odictiter_new is unlocked under free-threading — use-after-free SIGSEGV (GH-133734 follow-up to gh-125996)

Prelude

It seems there's been a few bugs found in OrderedDict in free-threaded already. So I wanted to get on 3.14 myself to ensure I was up to date. Hopefully this one is new for you all. I've attached a suggested patch and double-checked the sigfault producer actually sigfaults.

ordereddict_odictiter_new_ft_fix.patch

Summary

gh-125996 ("nogil segmentation fault on ordered dict operations") was closed as fixed by GH-133734 ("fix thread safety of ordered dict"). That PR added Py_BEGIN_CRITICAL_SECTION(od) to the iterator advance path (odictiter_nextkey / _odict_find_node, which now asserts the lock is held), but it left the iterator constructor odictiter_new unlocked. As a result, concurrent iteration vs popitem/__delitem__ still SIGSEGVs on a free-threaded build. This is a use-after-free (memory unsafety), still present on main.

Reproducer

Pure stdlib (threading + collections.OrderedDict), GC disabled to rule out the cyclic collector:

import collections, gc, threading
gc.disable()
N = 8
od = collections.OrderedDict((k, k) for k in range(2000))
go = threading.Barrier(N)

def churn():
    go.wait()
    for i in range(2_000_000):
        try: od.popitem(last=(i & 1) == 0)
        except KeyError: pass
        k = i & 4095
        od[k] = k
        try: od.move_to_end(k, last=(i & 2) == 0)
        except KeyError: pass

def walk():
    go.wait()
    for _ in range(2_000_000):
        try:
            n = 0
            for _key in od:          # odictiter_iternext -> SIGSEGV
                n += 1
                if n >= 64: break
        except (RuntimeError, KeyError): pass

ts = ([threading.Thread(target=churn) for _ in range(N // 2)] +
      [threading.Thread(target=walk)  for _ in range(N - N // 2)])
for t in ts: t.start()
for t in ts: t.join()
print("clean", len(od))

Observed

CPython 3.14.6 free-threaded build (PYTHON_GIL=0), Linux x86-64:

  • GIL OFF: 12/12 runs SIGSEGV (typically within ~1s)
  • GIL ON: 0/12 runs crash
  • gc.disable() still crashes → it is the refcount/iterator race, not the cyclic collector
  • Verified on current main: odictiter_new still has no critical section.

GDB (consistent across crashes):

#0  PyObject_Hash (v=...) at Objects/object.c   <- v->ob_type == 0x0 (NULL / freed)
#1  _odict_find_node (key=v, od=...) at Objects/odictobject.c
#2  odictiter_nextkey_lock_held (di=...) at Objects/odictobject.c
#3  odictiter_iternext ...

The faulting object is the iterator's saved di_current (the head/tail key captured at construction), read as freed memory (ob_type == NULL). The di_state == od_state / di_size guards both pass at the crash, so they do not protect against this.

Root cause

odictiter_new (Objects/odictobject.c) captures the first/last node and its key without holding the dict's critical section:

node = reversed ? _odict_LAST(od) : _odict_FIRST(od);          // unlocked read
di->di_current = node ? Py_NewRef(_odictnode_KEY(node)) : NULL; // unlocked read of node->key

tp_iter (for x in od) and the keys/values/items/reversed iterators all reach odictiter_new with no lock held. Concurrently, popitem/__delitem__ runs _odict_remove_node + _odictnode_DEALLOC (Py_DECREF(node->key); PyMem_Free(node)) under the dict lock. So odictiter_new can read the head node / node->key while another thread unlinks and frees that node, capturing a dangling key into di_current; the first __next__ then hashes it → SIGSEGV.

GH-133734 locked the advance path but not the constructor, so the window simply moved to iterator creation.

Suggested fix

Wrap the head/tail + key capture in odictiter_new in Py_BEGIN_CRITICAL_SECTION(od) / Py_END_CRITICAL_SECTION() (mirroring odictiter_nextkey), so iterator construction observes a consistent locked snapshot of the node and its key:

Py_BEGIN_CRITICAL_SECTION(od);
node = reversed ? _odict_LAST(od) : _odict_FIRST(od);
di->di_current = node ? Py_NewRef(_odictnode_KEY(node)) : NULL;
di->di_size = PyODict_SIZE(od);
di->di_state = od->od_state;
Py_END_CRITICAL_SECTION();

Found via a concurrent-OrderedDict stress test; reduced to the standalone repro above. Filing as a follow-up because gh-125996 is closed. Happy to open a PR for the one-section fix if useful.

CPython versions tested on:

3.14

Operating systems tested on:

No response

Output from running 'python -VV' on the command line:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions