Python Magic Methods — Set Corruption When __hash__ Missing
Orders missing from set when __eq__ defined without __hash__ triggers TypeError and silent corruption.
20+ years shipping production Python across data and backend systems. Lessons pulled from things that broke in production.
- Magic methods are hooks Python calls automatically on operator, attribute, or built-in usage — they are looked up on the class type, not the instance, so monkey-patching a dunder on a single object has no effect
- __init__ initializes instances; __new__ controls creation itself — override __new__ only for singletons or immutable subclasses
- __repr__ must be unambiguous for debugging; __str__ is user-friendly — if you only implement one, implement __repr__
- __eq__ and __hash__ must agree; break this rule and sets and dicts silently corrupt — implementing __eq__ without __hash__ makes the object unhashable with TypeError
- __slots__ saves 40–60% memory per instance but requires every class in the hierarchy to define its own slots, and @dataclass(slots=True) is the safer way to get the same benefit in Python 3.10+
- __getattr__ is a fallback for missing attributes only; __getattribute__ intercepts every access — confusing the two is a common source of silent bugs
- __call__ makes any instance callable; __enter__ and __exit__ power the with statement — both are more production-common than most tutorials acknowledge
Magic methods (dunders) are Python's protocol hooks — double-underscore methods that CPython's interpreter dispatches to when you use syntax like obj[key], with obj:, or a == b. They're not 'magic' in the sense of runtime reflection; they're a concrete contract between your objects and the interpreter's C-level dispatch tables.
Every for x in obj triggers __iter__, every str(obj) calls __repr__ or __str__, and every set insertion calls __hash__ then __eq__. Understanding this dispatch is critical because violating the implicit contracts — like defining __eq__ without __hash__ — silently corrupts hash-based collections (dicts, sets) by making objects unhashable or, worse, allowing duplicates.
Real-world bugs from this: Django model instances losing uniqueness in sets, Redis cache keys silently colliding, and ORM identity maps returning stale objects. The fix is always explicit: either set __hash__ = None to prevent accidental use, or implement both methods consistently using immutable fields.
Imagine you buy a fancy coffee machine. Out of the box it already knows how to turn on, make a sound when it is done, and show its status on a little screen — you did not program any of that, it just came built-in. Python magic methods are the same idea: they are pre-agreed slots that Python calls automatically when certain things happen to your object, like printing it, adding two of them together, or checking if they are equal. You fill in the slot, Python does the calling. The important detail is that Python looks for that slot on the class, not on the individual object. You cannot give one specific coffee machine a custom startup sound by sticking a label on it — you have to update the model's factory spec.
Every time Python evaluates len(my_list), compares two objects with ==, or prints something with print(), it is secretly delegating that work to a special method buried inside the object's class. This is not magic in the stage-trick sense — it is a precisely defined protocol that makes Python's data model tick. Understanding it is the difference between writing classes that play nicely with the rest of Python's ecosystem and writing classes that feel bolted-on and awkward.
Before magic methods existed as a concept, languages forced you to register callbacks or inherit from a god-class just to make your objects behave like built-in types. Python solved this with a clean contract: implement a double-underscore method with a specific name (hence 'dunder'), and the interpreter will call it at the right moment. The result is that your custom Vector class can support +, len(), slicing, context managers, and even pickling — all without inheriting from anything.
By the end of this article you will understand not just the syntax but the CPython internals that make these calls happen, the subtle ordering rules Python follows when resolving them, the performance traps hiding in __getattr__ and __slots__, how __call__ and __enter__/__exit__ work in production, why __del__ is almost always the wrong answer to resource cleanup, and the patterns senior engineers use in production libraries. You will also have concrete answers for the interview questions that trip up even experienced Python developers.
What Are Magic Methods? The CPython Dispatch Contract
Magic methods are special method names with double underscores on both ends that Python's interpreter calls implicitly when you use certain syntax or built-in operations. The 'magic' is not runtime wizardry — it is a well-defined protocol baked into CPython's bytecode execution and the C-level type structure.
When you write len(obj), CPython calls type(obj).__len__(obj), not obj. directly. Python looks up the method on the class, not the instance. This distinction is critical: if you try to monkey-patch a dunder onto a single object — __len__()obj.__len__ = lambda: 42 — it will have no effect when you call len(obj). The built-in function goes through the type, always.
The same dispatch applies to obj + other: Python looks up __add__ on type(obj), then __radd__ on type(other), then raises TypeError if both return NotImplemented. This two-step lookup is why int + float works — int.__add__ returns NotImplemented for floats, so Python falls back to float.__radd__. The protocol enables cooperation between types that neither owns the other.
A practical guide for choosing when to implement a dunder versus a regular method: - Use a dunder when you want Python syntax (+, len(), str(), ==) to work naturally on your object. - Use a regular named method when you need a descriptive API — is clearer than add_item()__add__ for domain-specific logic. - Use __lt__ and related comparison dunders when you want your objects to work with sorted(), min(), and max(). - Use __call__ when you want an instance to behave like a function — this is more common in production than tutorials suggest.
The key mental model: each dunder is a slot in CPython's type structure (PyTypeObject). If the class defines the dunder, Python fills the slot. If not, the slot is NULL and the built-in raises TypeError. This is also why __getattr__ cannot intercept len() — __getattr__ operates at the Python level, but len() goes through a C-level slot that bypasses Python attribute lookup entirely.
class Vector: def __init__(self, x, y): self.x = x self.y = y def __repr__(self): return f"Vector({self.x}, {self.y})" def __add__(self, other): if not isinstance(other, Vector): return NotImplemented # signals: I cannot handle this type return Vector(self.x + other.x, self.y + other.y) def __radd__(self, other): # Called when other.__add__(self) returned NotImplemented return self.__add__(other) def __len__(self): # Euclidean magnitude, truncated to int (len() requires an int) return int((self.x**2 + self.y**2)**0.5) v1 = Vector(3, 4) v2 = Vector(1, 2) print(v1 + v2) # calls Vector.__add__ -> Vector(4, 6) print(len(v1)) # calls Vector.__len__ -> 5 # Attempting to monkey-patch __len__ on the instance has no effect v1.__len__ = lambda: 999 print(len(v1)) # still 5 — len() goes through type(v1).__len__, not the instance
len() calls: __getattr__ operates at the Python attribute level, while len() goes through a C slot that bypasses Python attribute lookup entirely.len() still use the C-level slot and ignore your override. This means __getattr__ cannot intercept len() calls — surprising if you relied on it for logging. To intercept len(), override __len__ directly. To intercept all attribute access including dunders, you need a proxy class with explicitly defined dunders, not just __getattribute__.len() or + to use it — the lookup always goes through the class.__init__ vs __new__: Object Creation Under the Hood
Most developers know __init__ as the constructor, but technically __new__ is the true constructor — it allocates the object. __init__ only initializes an already-created instance. This distinction matters when you need immutable objects or when you subclass immutable types like tuple or str.
__new__ receives the class as its first argument and must return an instance, usually by calling super().__new__(cls). If __new__ returns an instance of a different class, __init__ is NOT called. That is not a bug — it is a deliberate rule: __init__ only runs if __new__ returned an instance of the class being constructed.
In production, you rarely override __new__ unless you need a singleton, a flyweight, or to subclass an immutable type. For 95% of Python classes, __init__ is all you need.
- Subclassing immutable types (tuple, str, int, frozenset) — __init__ cannot modify an already-created immutable, so modification must happen in __new__.
- Singleton or flyweight patterns — __new__ can return an existing cached instance.
- Factory patterns where the returned type depends on the arguments — __new__ can return a subclass instance.
One rule worth memorising: if __new__ returns an instance of a completely different class, __init__ is skipped. This is how some serialisation libraries work — they call __new__ to allocate the shell of an object, then populate attributes directly without going through __init__.
class ImmutablePoint(tuple): """A point that IS a tuple — immutable by nature. We must use __new__ because tuple is immutable. By the time __init__ would run, the tuple's content is already fixed. Trying to set values in __init__ would raise an error. """ def __new__(cls, x, y): # super().__new__ allocates the tuple with the given content instance = super().__new__(cls, (x, y)) return instance # No __init__ needed — content is fixed at allocation time def __repr__(self): return f"ImmutablePoint(x={self[0]}, y={self[1]})" pt = ImmutablePoint(3, 4) print(pt) # ImmutablePoint(x=3, y=4) print(pt[0], pt[1]) # 3 4 print(isinstance(pt, tuple)) # True # Demonstrates singleton pattern via __new__ class Singleton: _instance = None def __new__(cls, *args, **kwargs): if cls._instance is None: cls._instance = super().__new__(cls) return cls._instance # always returns the same object def __init__(self, value): self.value = value s1 = Singleton(10) s2 = Singleton(20) print(s1 is s2) # True — same object print(s1.value) # 20 — __init__ ran twice on the same instance
super().__new__(cls) and return an instance of cls unless you have a very deliberate reason to return something else.super().__new__(cls). Forgetting to call super().__new__ results in a TypeError at instantiation time. In the Singleton pattern, note that __init__ still runs on every call even though __new__ returns the same object — so initialisation logic in __init__ will overwrite state on every construction, as shown in the example above.__str__ vs __repr__: Which to Use for Debugging and Logging
Both __repr__ and __str__ return string representations, but the contract is different. __repr__ should be unambiguous — ideally a string you could pass to eval() to recreate the object. __str__ should be readable to an end user. Python uses __str__ for print() and str(), and __repr__ for the interactive interpreter and the f-string debug format (f"{obj!r}").
If you define only one, define __repr__. When __str__ is missing, Python falls back to __repr__. The reverse is not true: if __str__ is defined but __repr__ is not, the interactive interpreter and logging frameworks that call repr() still show the default <__main__.MyClass object at 0x...> — completely useless in a production log.
A practical rule from years of on-call experience: if you ship a class to production without __repr__, you will eventually spend time staring at a log file trying to figure out which object was which. It takes about five minutes to implement and saves hours over the lifetime of the service.
For sensitive data — passwords, API keys, personal information — mask the value in __repr__ rather than omitting it entirely. Showing User(id=42, email='a***@example.com') is far more useful for debugging than User(id=42) and still protects the data.
class User: def __init__(self, user_id, name, api_key): self.user_id = user_id self.name = name self._api_key = api_key # sensitive — must not appear in logs def __repr__(self): # Unambiguous, shows class name and constructor arguments # Masks the sensitive api_key field — shows existence but not value return ( f"User(user_id={self.user_id!r}, name={self.name!r}, " f"api_key='***')" ) def __str__(self): # Human-readable for display — no internal details return f"{self.name} (ID: {self.user_id})" u = User(42, 'Alice', 'secret-key-abc123') print(repr(u)) # unambiguous, safe to log print(str(u)) # user-friendly display print(f"{u!r}") # same as repr(u) print(f"{u}") # same as str(u) # Fallback behaviour: if __str__ is missing, Python uses __repr__ class MinimalClass: def __repr__(self): return "MinimalClass()" obj = MinimalClass() print(str(obj)) # MinimalClass() — falls back to __repr__
repr() will show a useless memory address. This makes it impossible to distinguish objects in logs without extra context. Every class that gets passed to a logger, stored in an exception, or printed in a traceback deserves a __repr__. It is the highest-ROI dunder you can implement.__eq__ and __hash__: The Contract That Keeps Dicts and Sets Consistent
Python's dict and set rely on hash tables. When you look up a key, Python computes its hash (via __hash__) to find the bucket, then checks equality (via __eq__) to confirm the match. The contract is simple: if two objects are equal (__eq__ returns True), their hashes MUST be equal. Break this contract and you corrupt the data structure — objects disappear, duplicates appear, and lookups return wrong results.
There are two distinct failure modes that get conflated:
Failure mode one — __eq__ without __hash__: Python implicitly sets __hash__ to None, making the object unhashable. Any attempt to add it to a set or use it as a dict key raises TypeError: unhashable type immediately. This is the loud, obvious failure.
Failure mode two — mutable hash: You implement both __eq__ and __hash__, but __hash__ is based on a mutable field. The object is inserted into a set. The field changes. Now the hash is different, the object is in the wrong bucket, and it can never be found or removed. The set appears to contain an object it cannot retrieve. This is the silent, dangerous failure — no exception, just corrupted state.
The fix for both is the same: base __hash__ only on immutable fields, and match those fields to what __eq__ uses.
For mutable objects where value-based equality is needed, the right answer is usually: do not implement __hash__ at all (leave it None explicitly), and use a list or a different data structure that does not require hashing. If you need to deduplicate mutable objects, extract an immutable key and deduplicate on that.
class Order: def __init__(self, order_id, symbol, quantity): self.order_id = order_id self.symbol = symbol self.quantity = quantity def __eq__(self, other): if not isinstance(other, Order): return NotImplemented return self.order_id == other.order_id def __hash__(self): # Hash only the immutable order_id — matches what __eq__ compares # Never hash mutable fields like quantity return hash(self.order_id) def __repr__(self): return f"Order(order_id={self.order_id!r}, qty={self.quantity})" o1 = Order(1, 'AAPL', 100) o2 = Order(1, 'AAPL', 200) # same order_id, different quantity order_set = {o1, o2} print(order_set) # one element — __eq__ and __hash__ agree print(len(order_set)) # 1 # Demonstrating the mutable-hash silent corruption failure class BrokenOrder: """Do NOT do this — __hash__ based on a mutable field.""" def __init__(self, order_id): self.order_id = order_id def __eq__(self, other): return self.order_id == other.order_id def __hash__(self): return hash(self.order_id) # dangerous if order_id can change bad = BrokenOrder(1) bad_set = {bad} print(bad in bad_set) # True before mutation bad.order_id = 99 # mutate the field used in __hash__ print(bad in bad_set) # False — object is lost in the wrong bucket print(len(bad_set)) # 1 — the set still 'has' it but cannot find it
__getattr__, __setattr__, and __delattr__: Attribute Access Control and Pitfalls
Python gives you fine-grained control over attribute access via three hooks: __getattr__ (fallback for failed lookups), __setattr__ (every attribute assignment), and __delattr__ (attribute deletion). __getattr__ is called only when normal attribute lookup fails — it is NOT the same as __getattribute__, which intercepts every access.
The difference matters enormously in practice. If you access an attribute that exists, __getattr__ is never called — __getattribute__ handles it. __getattr__ is specifically the fallback of last resort.
The underscore guard in the proxy example below deserves an explicit explanation because engineers routinely remove it thinking it is defensive noise. Without it, accessing self._config inside __getattr__ would trigger __getattr__ again (because _config might not exist yet during the early stages of __init__ before the line self._config = config executes). The guard raises AttributeError immediately for underscore-prefixed names, which tells Python to abort the lookup rather than recurse.
Inside __setattr__, never write self.x = value — that calls __setattr__ again and creates infinite recursion. Always delegate through super().__setattr__(name, value) or object.__setattr__(self, name, value) for attributes that need to bypass the custom logic.
class ConfigProxy: """A proxy that delegates attribute reads to a backing dict. The underscore guard in __getattr__ is NOT optional. During __init__, before self._config is set, any attribute access that fails normal lookup will trigger __getattr__. Without the guard, accessing self._config inside __getattr__ would recurse indefinitely. The guard raises AttributeError immediately for private/internal names, which tells Python to abort the lookup cleanly. """ def __init__(self, config: dict): # Routes through __setattr__ below, which uses super() for _config self._config = config def __getattr__(self, name): # This is only called when normal lookup FAILS (attribute not found) # The underscore guard prevents infinite recursion during __init__ # and for any internal attribute that may not exist yet if name.startswith('_'): raise AttributeError(name) try: return self._config[name] except KeyError: raise AttributeError(f"Config key '{name}' not found") def __setattr__(self, name, value): if name == '_config': # Use super() for internal attributes to avoid infinite recursion # Writing self._config = value here would call __setattr__ again super().__setattr__(name, value) else: self._config[name] = value cfg = ConfigProxy({'host': 'localhost', 'port': 5432}) print(cfg.host) # 'localhost' — found in _config via __getattr__ print(cfg.port) # 5432 — same path cfg.timeout = 30 # routes through __setattr__ -> stored in _config print(cfg.timeout) # 30 try: print(cfg.missing_key) except AttributeError as error: print(f"Correctly raised: {error}")
super().__setattr__(name, value) or object.__setattr__(self, name, value). The underscore guard (raise AttributeError for names starting with underscore) is your safety net for early-construction scenarios.get() method with an explicit default argument — do not hide missing configuration inside attribute lookup.super() to avoid infinite recursion. The underscore guard in __getattr__ prevents recursion during early object construction — do not remove it.__call__, __enter__, __exit__, and __del__ — The Production Dunders Nobody Teaches
Most tutorials cover __init__, __repr__, and __eq__ and stop there. But three other dunders appear constantly in production Python and deserve explicit coverage.
__call__ makes any instance behave like a function. Any class with __call__ defined can be invoked with parentheses — instance(args). This is how decorators implemented as classes work, how stateful callables maintain configuration, and how middleware layers stay composable without inheritance. In 2026, __call__ is especially common in ML inference pipelines where a model object is callable, and in dependency injection containers that need to produce objects on demand.
__enter__ and __exit__ power the with statement. __enter__ is called at the start of a with block and its return value is bound to the as variable. __exit__ is called at the end, whether the block exits normally or through an exception. __exit__ receives the exception type, value, and traceback — if it returns True, the exception is suppressed; if it returns False or None, the exception propagates. This is the correct pattern for resource management in Python, replacing try/finally in almost every case.
__del__ is the finalizer — it is called when an object's reference count drops to zero. It looks like the right place for cleanup, but it is unreliable in practice: it may not be called if there are reference cycles (CPython's cycle garbage collector handles those separately), it is never guaranteed to run in PyPy, and it is never guaranteed to run at all during interpreter shutdown. The rule is simple: if a resource must be released, use a context manager (__enter__/__exit__) or an explicit close() method. Reserve __del__ for logging or debugging only, never for resource release that must happen.
import time # --- __call__: making an instance callable --- class RateLimiter: """A callable that enforces a minimum interval between calls. __call__ makes the instance usable as a function or decorator. """ def __init__(self, min_interval_seconds: float): self._min_interval = min_interval_seconds self._last_called = 0.0 def __call__(self, func): """Wraps a function with rate limiting.""" def wrapper(*args, **kwargs): elapsed = time.monotonic() - self._last_called if elapsed < self._min_interval: time.sleep(self._min_interval - elapsed) self._last_called = time.monotonic() return func(*args, **kwargs) return wrapper def __repr__(self): return f"RateLimiter(min_interval={self._min_interval}s)" # --- __enter__ and __exit__: context manager protocol --- class DatabaseConnection: """Manages a database connection lifecycle. __enter__ opens the connection; __exit__ closes it regardless of exceptions. This is the correct pattern for resource management — not __del__. """ def __init__(self, connection_string: str): self._connection_string = connection_string self._connection = None def __enter__(self): print(f"Opening connection to {self._connection_string}") self._connection = f"<Connection:{self._connection_string}>" # simulated return self # bound to the 'as' variable in the with block def __exit__(self, exc_type, exc_val, exc_tb): print(f"Closing connection (exc_type={exc_type})") self._connection = None # Return False (or None) to let exceptions propagate # Return True only if you intentionally want to suppress them return False def query(self, sql: str) -> str: if self._connection is None: raise RuntimeError("Not connected. Use as a context manager.") return f"Result of [{sql}] on {self._connection}" def __repr__(self): return f"DatabaseConnection({self._connection_string!r})" # __call__ in action limiter = RateLimiter(0.0) # zero interval for demo print(f"RateLimiter is callable: {callable(limiter)}") @limiter def fetch_data(): return "data" result = fetch_data() print(f"fetch_data result: {result}") # __enter__ and __exit__ in action with DatabaseConnection("postgres://localhost/mydb") as db: print(db.query("SELECT 1")) # __del__ warning — never rely on it for resource cleanup class LeakyResource: def __del__(self): # This may never be called in PyPy, during interpreter shutdown, # or if there are reference cycles. Use a context manager instead. print("__del__ called — do not rely on this for real cleanup")
close() method. __del__ is for optional cleanup at best, and for logging at worst.__slots__: Memory Efficiency, Inheritance Rules, and @dataclass(slots=True)
__slots__ is a declarative way to prevent the creation of a per-instance __dict__, saving memory and speeding up attribute access. Each slot reserves space for an attribute as a direct pointer in a fixed-size array, bypassing the dict lookup entirely.
The memory saving is concrete. A regular class instance in CPython 3.12 carries a __dict__ that takes roughly 200–400 bytes even when empty, plus the standard instance overhead. A slotted instance replaces that with a fixed array of pointers — typically 40–50% smaller per instance. For a service that creates millions of geolocation points or financial tick records, that difference matters.
The complexity cost is real too. Every class in the inheritance hierarchy must define its own __slots__. A subclass that does not define __slots__ will have a __dict__ anyway, defeating the memory saving. Multiple inheritance with __slots__ requires careful coordination — if two parent classes both define non-empty __slots__, the subclass must be designed to avoid layout conflicts.
For private attributes with name mangling, the slot must use the mangled name. A class Foo with self.__price needs the slot named '_Foo__price', not '__price'. This is a common mistake that produces an AttributeError that looks nothing like a slots issue.
In Python 3.10+, @dataclass(slots=True) generates both the class and its __slots__ automatically, handles the mangled names correctly, and avoids most of the manual slot management pitfalls. This is the recommended way to use slots for most teams in 2026.
import sys from dataclasses import dataclass # --- Manual __slots__ --- class Point2D: __slots__ = ('x', 'y') def __init__(self, x: float, y: float): self.x = x self.y = y def __repr__(self): return f"Point2D(x={self.x}, y={self.y})" # --- @dataclass(slots=True) — Python 3.10+: the modern way --- @dataclass(slots=True) class Point2DDataclass: x: float y: float # --- Comparison --- regular_point = Point2D(3.0, 4.0) slot_point = Point2DDataclass(3.0, 4.0) print(f"Manual slots instance: {regular_point}") print(f"Dataclass slots instance: {slot_point}") # __slots__ prevents arbitrary attribute creation — this is a safety feature try: regular_point.z = 5.0 except AttributeError as error: print(f"Slot safety caught it: {error}") # Without __slots__, this would silently create a new attribute class RegularPoint: def __init__(self, x, y): self.x = x self.y = y rp = RegularPoint(3.0, 4.0) rp.z = 5.0 # silently creates a new attribute — no error print(f"Regular class allows accidental attr: {rp.z}") # Private attribute with mangled name in slots class PricedItem: __slots__ = ('name', '_PricedItem__price') # mangled name required def __init__(self, name: str, price: float): self.name = name self.__price = price # stored in '_PricedItem__price' slot @property def price(self): return self.__price item = PricedItem("Widget", 9.99) print(f"Price via property: {item.price}")
Operator Overloading: Why Your Custom Objects Should Behave Like Built-ins
Magic methods aren't just for object setup and teardown. They let your custom objects participate in Python's operator system — arithmetic, comparison, indexing, and membership checks. This is how you make your classes feel native.
The real production win: when you define __add__, __sub__, or __mul__ on a vector or money type, you eliminate entire classes of bugs. Instead of calling obj.add(other) and hoping the internals don't mutate state, you get obj + other which returns a new instance. Immutability by contract.
Python dispatches these operators through a lookup table in CPython's ceval.c. When the interpreter sees a + b, it tries a.__add__(b). If that fails, it tries b.__radd__(a). If both raise TypeError, you get the familiar unsupported operand type exception. This is why you sometimes see __radd__ — for when your type is on the right side of the operator.
Comparison operators follow the same pattern but with a twist: __eq__ returning NotImplemented triggers the reflected call. Don't return False when you mean "I don't know" — that breaks != behavior.
// io.thecodeforge — python tutorial class Vector2D: def __init__(self, x: float, y: float): self._x = x self._y = y def __add__(self, other: 'Vector2D') -> 'Vector2D': if not isinstance(other, Vector2D): return NotImplemented return Vector2D(self._x + other._x, self._y + other._y) def __eq__(self, other: object) -> bool: if not isinstance(other, Vector2D): return NotImplemented return self._x == other._x and self._y == other._y def __repr__(self) -> str: return f"Vector2D(x={self._x}, y={self._y})" v1 = Vector2D(3, 4) v2 = Vector2D(1, 2) print(v1 + v2) print(v1 == v2)
NotImplemented, not False or None, when the other operand type is unsupported. False breaks the fallback chain and can silently produce wrong comparison results.Membership Operators: The __contains__ Shortcut You Didn't Know You Had
Ever written if item in my_collection: and wondered how Python decides? It calls __contains__. If you don't define it, Python falls back to iterating with __iter__ and comparing each element with ==. That's O(n) even for an O(1) data structure.
Define __contains__ when your class wraps a set, dict, or any fast-lookup structure. It's a single method that turns your objects into first-class citizens for the in operator. The payoff: cleaner code and potential performance wins.
Here's the CPython detail you won't find in beginner blogs: Python checks __contains__ before __iter__. If __contains__ exists and returns a boolean, the interpreter uses it. If __contains__ raises TypeError or doesn't exist, it falls back to iteration. This is documented in Objects/abstract.c under PySequence_Contains.
The second hidden party: __contains__ is also used by not in. That's syntactic sugar for not (obj in container). No extra method needed. Just make sure __contains__ always returns a bool, not None or an integer — that will break not in unexpectedly.
// io.thecodeforge — python tutorial class WhiteList: def __init__(self, items: set): self._items = items def __contains__(self, item: object) -> bool: # Direct set membership: O(1) return item in self._items # Without __contains__, Python would iterate: # for i in self._items: # if i == item: return True # That's O(n) even for a set — don't do it. allowed = WhiteList({'alice', 'bob', 'charlie'}) print('alice' in allowed) print('zoe' not in allowed)
__contains__ directly to it. No loops, no conditionals — just return item in self._container.Supporting Iteration With __iter__ and __next__: Stop Hacking Lists
Every Python dev has written a custom collection and reached for a list. That works, but it's wrong. Lists are storage, not iteration policy. If your object represents a deck of cards, a log stream, or a paginated API response, you need controlled iteration.
The contract is brutal but tiny: implement __iter__ to return an iterator, and __next__ on that iterator to raise StopIteration when done. That's it. No inheritance from list. No storing a cursor index like it's 1999. Returning self from __iter__ means your object is its own iterator, fine for simple cases. Returning a separate iterator object is better when you need multiple independent passes.
The payoff? Your object works with for loops, comprehensions, tuple unpacking, and any function that expects an iterable. The Python runtime does the rest. Stop fighting the protocol and start shipping code that behaves like a built-in.
// io.thecodeforge — python tutorial class Deck: def __init__(self): self.cards = [f"{r}{s}" for s in "♠♥♦♣" for r in "A23456789JQK"] def __iter__(self): return iter(self.cards) class LogStream: def __init__(self, lines): self._lines = lines self._index = 0 def __iter__(self): return self def __next__(self): if self._index >= len(self._lines): raise StopIteration result = self._lines[self._index] self._index += 1 return result for card in Deck(): print(card, end=" ") print() for log in LogStream(["ERR: timeout", "INFO: retry", "OK: 200"]): print(log)
Building Iterables With __getitem__: The Lazy Dev's Protocol
You don't always need __iter__. Python's iteration protocol has a fallback path that checks for __getitem__. If your object supports integer indexing and raises IndexError when out of range, the for loop works automatically. This is the oldest magic method trick in the book, and it's perfect for objects that are fundamentally indexable.
Why would you use this instead of the full __iter__/__next__ contract? Because it's less code. If your object already has a natural ordering and you can describe element access by position, __getitem__ gives you iteration for free. No iterator state. No StopIteration boilerplate. Just implement __getitem__(self, index) and raise IndexError when done.
Downside: you don't get fine-grained control over iteration direction or state. But for paginated data, read-only sequences, or anything that maps cleanly to an array, this is the senior shortcut. The interpreter handles the rest. Ship it.
// io.thecodeforge — python tutorial class PageFetcher: def __init__(self, base_url, page_size=10): self.base_url = base_url self.page_size = page_size def __getitem__(self, index): if index >= 3: raise IndexError("no more pages") return self._fetch_page(index) def _fetch_page(self, page_num): # Simulate API call return [f"item-{page_num}-{i}" for i in range(self.page_size)] pages = PageFetcher("https://api.example.com/items") for page in pages: print(f"Got page: {len(page)} items") # Manual indexing works too print(pages[1])
String Representation With __repr__ and __str__
Python uses dunder methods to decide how your objects appear as strings. The __repr__ method returns an unambiguous, developer-facing string that should ideally be valid Python code to recreate the object. The __str__ method returns a readable, user-facing string that Python calls when you print() an object or pass it to str(). If __str__ is missing, Python falls back to __repr__. Neglecting __repr__ makes debugging painful — you see <MyObject object at 0x...> instead of meaningful data. Every production class should define __repr__ at minimum. The rule: __repr__ is for other developers, __str__ is for end users.
// io.thecodeforge — python tutorial class User: def __init__(self, name, uid): self.name = name self.uid = uid def __repr__(self): return f"User('{self.name}', {self.uid})" def __str__(self): return f"User {self.name} (ID: {self.uid})" user = User("Alice", 42) print(repr(user)) # Unambiguous, recreatable print(str(user)) # Human-readable print(user) # Falls back to __str__
repr() will show the default <__main__.User object at 0x...>.Introduction to Statistics With Python Built-ins
Before reaching for NumPy, know that Python’s standard library has a statistics module for common tasks. The module provides mean(), median(), mode(), stdev(), and variance() — all implemented with pure Python and zero dependencies. These functions accept iterables of numbers and return floats or raise StatisticsError on invalid input (like empty data). Use them for quick exploratory analysis, test assertions, or simple reporting. For large datasets or performance-critical code, switch to NumPy. The key insight: for most day-to-day statistical needs, Python’s built-in statistics module is enough and avoids adding a heavy dependency for trivial operations.
// io.thecodeforge — python tutorial import statistics data = [2.5, 3.1, 4.0, 3.1, 5.2] mean = statistics.mean(data) median = statistics.median(data) mode = statistics.mode(data) stdev = statistics.stdev(data) print(f"Mean: {mean:.2f}") print(f"Median: {median:.2f}") print(f"Mode: {mode:.2f}") print(f"Std Dev: {stdev:.2f}")
statistics.pstdev().The Silent Set Corruption: When __eq__ Without __hash__ Breaks Production
- Never implement __eq__ without __hash__ unless you explicitly want unhashable instances — and if you want unhashable, set __hash__ = None explicitly so the intent is clear.
- Use @dataclass(frozen=True) for value objects to avoid manual boilerplate and to enforce immutability.
- Add a unit test that inserts two equal objects into a set and checks the set has exactly one element. That test would have caught this on day one.
- When a TypeError surfaces in production and you apply a workaround under pressure, the root cause investigation still matters. The workaround may introduce a second bug that is harder to find.
len() will not find it because Python looks up dunders on the type, not the object. Also confirm the method returns an integer — returning a float raises TypeError.python -c "from mymodule import MyClass; print(hasattr(MyClass, '__len__'))"python -c "from mymodule import MyClass; obj = MyClass(); print(type(obj).__len__(obj))"len() raises TypeError.python -c "from mymodule import Order; print(Order.__hash__)"python -c "from mymodule import Order; o = Order(1); print(hash(o))"python -c "from mymodule import Order; o = Order(1); print(hash(o)); o.order_id = 2; print(hash(o))"python -c "from mymodule import Order; s = {Order(1)}; o = list(s)[0]; o.order_id = 99; print(o in s)"python -c "from mymodule import MyClass; print(MyClass.__slots__)"python -c "from mymodule import MyClass; print([a for a in dir(MyClass) if not a.startswith('__')])"| Category | Magic Methods | Common Use Case |
|---|---|---|
| Object Creation | __new__, __init__, __del__ | Custom initialization, singletons, immutable subclasses. Never use __del__ for resource cleanup — it is unreliable. |
| String Representation | __repr__, __str__, __format__ | Debugging, logging, and display. __repr__ is mandatory; __str__ is optional. Always implement __repr__. |
| Comparison and Hashing | __eq__, __ne__, __lt__, __le__, __gt__, __ge__, __hash__ | Custom equality, sorting, and use in sets and dicts. __eq__ and __hash__ must agree — break this and data structures corrupt silently. |
| Numeric Operators | __add__, __sub__, __mul__, __truediv__, __radd__, __iadd__ | Custom arithmetic. Always return NotImplemented for unsupported types, not raise TypeError. |
| Attribute Access | __getattr__, __getattribute__, __setattr__, __delattr__ | Lazy loading, proxies, access control. __getattr__ is fallback; __getattribute__ is all-access — know the difference. |
| Container Emulation | __len__, __getitem__, __setitem__, __delitem__, __contains__ | Custom collections, sequences, and mappings. |
| Callable | __call__ | Makes instances callable like functions. Used in decorators, stateful pipelines, ML model wrappers, and middleware. |
| Context Manager | __enter__, __exit__ | Resource management with the with statement. The correct alternative to __del__ for guaranteed cleanup. |
| Iteration | __iter__, __next__ | Custom iterators and generators. __iter__ returns self (or a separate iterator); __next__ returns the next value or raises StopIteration. |
Key takeaways
Common mistakes to avoid
7 patternsDefining __eq__ without __hash__
Using self.attr = value inside __setattr__ without delegating to super()
super().__setattr__(name, value) for internal assignments inside __setattr__.Assuming __getattr__ intercepts all attribute access
super().__getattribute__(name) inside __getattribute__ and handle AttributeError to delegate to __getattr__.Using __slots__ without accounting for mangled private attribute names
Using __del__ for resource cleanup
close() method is needed, document it clearly and never rely on __del__ to call it.Not implementing __repr__ in production classes
Monkey-patching a dunder on an instance expecting built-in functions to use it
Interview Questions on This Topic
What is the difference between __new__ and __init__? When would you override __new__?
Why is it important to implement __hash__ when you implement __eq__? What are the two failure modes if you get this wrong?
Explain how __slots__ works under the hood. What are the performance benefits and the limitations in 2026?
What is the difference between __getattr__ and __getattribute__? When would you use each?
How does Python decide which __add__ or __radd__ to call when evaluating a + b?
What is __call__ and give a production use case where implementing it is better than using a plain function.
Frequently Asked Questions
__repr__. It provides an unambiguous string representation that helps immensely during debugging and logging. If you implement only one dunder, make it __repr__. Include the class name and constructor arguments, and mask any sensitive fields rather than omitting them.
You cannot monkey-patch dunder methods on built-in types because Python looks them up on the type, not the instance, and built-in types are implemented in C with immutable type structures. You can subclass the built-in type and override the dunder on the subclass.
Python automatically sets __hash__ to None, making your objects unhashable. Attempting to add them to a set or use them as dict keys raises TypeError: unhashable type. If you want objects to be explicitly unhashable, set __hash__ = None yourself so the intent is documented in the code.
For simple data containers, prefer @dataclass (or @dataclass(frozen=True) for immutable value objects). It automatically generates __init__, __repr__, __eq__, and optionally __hash__. In Python 3.10+, @dataclass(slots=True) also generates __slots__ correctly. Use manual implementation when you need custom behaviour that dataclass cannot express, or when inheriting from a complex class hierarchy that predates dataclasses.
The most common cause: inside __getattr__, you access self.some_attr where some_attr does not exist yet (usually during early __init__). That triggers __getattr__ again, recursively. The fix is the underscore guard: raise AttributeError immediately for any name starting with underscore. This stops the recursion because Python treats the AttributeError as a clean lookup failure rather than a reason to recurse. A second cause: __getattr__ accesses self._backing_dict before that attribute is set, triggering the same loop. Always set backing attributes using super().__setattr__ or object.__setattr__ in __init__ to bypass __setattr__ hooks and ensure they exist before __getattr__ can be called.
Almost never. __del__ is called when an object's reference count drops to zero, but it is not guaranteed to run in the presence of reference cycles (CPython's cycle garbage collector handles those separately), is never guaranteed in PyPy, and may not run at all during interpreter shutdown. If a resource must be released — file handles, sockets, database connections, GPU memory — use a context manager (__enter__/__exit__) with a with statement. If you need an explicit close() method, document it clearly. __del__ is acceptable only for optional logging or debugging, never for correctness-critical cleanup.
20+ years shipping production Python across data and backend systems. Lessons pulled from things that broke in production.
That's OOP in Python. Mark it forged?
13 min read · try the examples if you haven't