Java Deadlock — Lock Ordering Failures in Payment Systems
Thread dumps revealed BLOCKED threads holding Account and Transaction locks in opposite order.
20+ years shipping production Java in banking & fintech. Written from production experience, not tutorials.
- Core concept: Two or more threads wait forever for locks held by each other — no progress, no error.
- Four conditions: Mutual exclusion, hold-and-wait, no preemption, circular wait — break any one.
- Detection: jstack shows BLOCKED threads; ThreadMXBean findMonitorDeadlockedThreads() works at runtime.
- Production reality: Service stalls silently — thread dump is the only reliable signal.
- Biggest mistake: Assuming synchronized blocks are safe — they enforce mutual exclusion but don't prevent circular waits.
Deadlock is a concurrency failure where two or more threads are blocked forever, each waiting for a resource that another thread holds. It's not a bug in the JVM — it's a bug in your code. The threads don't crash, they don't throw exceptions, they just stop.
Here's the classic example: Thread A holds lock L1 and wants lock L2. Thread B holds lock L2 and wants lock L1. Neither can proceed. This is called the ABBA deadlock. It's the most common pattern, but any cycle works.
Java's synchronized keyword and ReentrantLock are the tools that provide mutual exclusion, but they don't enforce the order in which you acquire locks. That's your responsibility.
Imagine two kids at a dinner table. Kid A grabs the ketchup and won't let go until Kid B passes the mustard. Kid B grabs the mustard and won't let go until Kid A passes the ketchup. Neither moves. They're stuck forever — that's a deadlock. In Java, threads do the exact same thing with locks: each holds one resource and waits for another that's already taken, and the program freezes silently.
Deadlock is the silent killer of Java applications. Your service passes all tests, deploys without a hitch, handles load fine for three hours — then suddenly stops responding. No exception, no crash, no log entry. Threads are alive but doing absolutely nothing. On-call engineers restart the JVM, the problem vanishes, and nobody knows why. This is deadlock's calling card, and it happens in real production systems far more often than most teams admit.
The core problem deadlock exploits is that mutual exclusion — the guarantee that only one thread can hold a lock at a time — is both essential for correctness and dangerous when combined with circular waiting. Java's synchronized keyword and ReentrantLock both provide mutual exclusion, but neither prevents you from building a cycle of waiting threads. The JVM won't throw an exception. It won't log a warning. It will simply let your threads sit there forever, holding resources that other threads desperately need.
By the end of this article you'll be able to read a thread dump and spot a deadlock in under 60 seconds, reproduce a deadlock deliberately to understand the mechanism at the bytecode level, use ThreadMXBean to detect deadlocks programmatically at runtime, and apply three concrete prevention strategies — lock ordering, tryLock with timeout, and lock-free data structures — that actually work in production. You'll also know which of those strategies to reach for depending on your specific situation.
What Is Deadlock in Java?
Deadlock is a concurrency failure where two or more threads are blocked forever, each waiting for a resource that another thread holds. It's not a bug in the JVM — it's a bug in your code. The threads don't crash, they don't throw exceptions, they just stop.
Here's the classic example: Thread A holds lock L1 and wants lock L2. Thread B holds lock L2 and wants lock L1. Neither can proceed. This is called the ABBA deadlock. It's the most common pattern, but any cycle works.
Java's synchronized keyword and ReentrantLock are the tools that provide mutual exclusion, but they don't enforce the order in which you acquire locks. That's your responsibility.
package io.thecodeforge.concurrent; public class DeadlockDemo { private static final Object lockA = new Object(); private static final Object lockB = new Object(); public static void main(String[] args) { Thread thread1 = new Thread(() -> { synchronized (lockA) { System.out.println("Thread1: acquired lockA"); try { Thread.sleep(100); } catch (InterruptedException e) {} synchronized (lockB) { System.out.println("Thread1: acquired lockB"); } } }); Thread thread2 = new Thread(() -> { synchronized (lockB) { System.out.println("Thread2: acquired lockB"); try { Thread.sleep(100); } catch (InterruptedException e) {} synchronized (lockA) { System.out.println("Thread2: acquired lockA"); } } }); thread1.start(); thread2.start(); } }
- Each child has exclusive control over one utensil (mutual exclusion).
- Child A holds fork and waits for spoon; Child B holds spoon and waits for fork (circular wait).
- Neither can force the other to release (no preemption).
- Both have what the other needs (hold-and-wait).
- The only fix: create a rule (lock ordering) that says 'always pick up fork first, then spoon'.
Object.wait() or park()notify().The Four Conditions for Deadlock (Coffman Conditions)
For a deadlock to occur, all four of these conditions must hold simultaneously. Break any one, and the deadlock disappears. This is your playbook for prevention.
1. Mutual exclusion – At least one resource must be held in a non-shareable mode. Only one thread can hold the lock at a time.
2. Hold and wait – A thread holds at least one resource and is waiting for additional resources held by other threads.
3. No preemption – Resources cannot be forcibly taken from a thread. Only the thread that holds it can release it.
4. Circular wait – There exists a set of waiting threads where each thread is waiting for a resource that the next thread holds. This is the cycle.
package io.thecodeforge.concurrent; // Pseudocode: verify all four conditions are present public class ConditionCheck { boolean mutualExclusion = true; // lock is exclusive boolean holdAndWait = true; // threads hold one lock while waiting for another boolean noPreemption = true; // lock can't be taken from thread boolean circularWait = true; // A waits for B, B waits for A public boolean isDeadlock() { return mutualExclusion && holdAndWait && noPreemption && circularWait; } }
How to Detect Deadlocks in Production
Deadlock detection in production relies on two primary methods: thread dump analysis and the ThreadMXBean API. You need both in your toolbelt.
The first thing to do when your service freezes is to capture a thread dump. Use jstack <pid> or kill -3 <pid> on Linux. The thread dump shows every thread's state and which locks it holds. Look for threads with java.lang.Thread.State: BLOCKED and a stack trace that shows waiting on a lock that another BLOCKED thread holds. That's your cycle.
For programmatic detection, use ThreadMXBean.findMonitorDeadlockedThreads(). It returns an array of thread IDs that are in a deadlock, or null if no deadlock exists. You can wrap this in a health check endpoint or a background thread that checks periodically and alerts your team.
package io.thecodeforge.concurrent; import java.lang.management.ManagementFactory; import java.lang.management.ThreadInfo; import java.lang.management.ThreadMXBean; public class DeadlockDetector { private final ThreadMXBean bean = ManagementFactory.getThreadMXBean(); public void detectDeadlocks() { long[] deadlockedIds = bean.findMonitorDeadlockedThreads(); if (deadlockedIds == null) { System.out.println("No deadlock detected."); return; } ThreadInfo[] infos = bean.getThreadInfo(deadlockedIds); for (ThreadInfo info : infos) { System.err.printf("DEADLOCK DETECTED: Thread %s (id=%d) holds %s and waits for %s%n", info.getThreadName(), info.getThreadId(), info.getLockName(), info.getLockOwnerName()); } // Alert on-call or trigger thread dump } public static void main(String[] args) throws InterruptedException { // Simulate deadlock with ABBA pattern Object lock1 = new Object(); Object lock2 = new Object(); Thread t1 = new Thread(() -> { synchronized (lock1) { try { Thread.sleep(100); } catch (InterruptedException e) {} synchronized (lock2) {} } }); Thread t2 = new Thread(() -> { synchronized (lock2) { try { Thread.sleep(100); } catch (InterruptedException e) {} synchronized (lock1) {} } }); t1.start(); t2.start(); Thread.sleep(500); new DeadlockDetector().detectDeadlocks(); } }
How to Reproduce and Analyze a Deadlock
Reproducing a deadlock is essential for understanding the mechanism. You'll write a simple program that causes an ABBA deadlock, then use jstack to see it live. This skill translates directly to debugging production issues.
First, write two threads. Thread1 locks resource A, then sleeps briefly to ensure Thread2 locks resource B, then tries to lock B. Thread2 does the opposite: lock B, sleep, then try to lock A. The sleep is crucial — without it, one thread might complete before the other starts, and no deadlock occurs.
After running, the program hangs. Capture a thread dump with jstack (or jcmd, or kill -3). In the dump, you'll see both threads in BLOCKED state, each waiting on a monitor held by the other. The 'Locked ownable synchronizers' section shows exactly which locks each thread holds.
Analyze the stack traces: the line numbers show where each thread is waiting. This tells you which resources are involved and in which order they were acquired. That's the information you need to fix the code globally.
package io.thecodeforge.concurrent; public class ReproduceDeadlock { private static final Object resA = new Object(); private static final Object resB = new Object(); public static void main(String[] args) throws InterruptedException { Thread t1 = new Thread(() -> { synchronized (resA) { System.out.println(Thread.currentThread().getName() + " locked resA"); sleep(100); // force timing to ensure deadlock synchronized (resB) { System.out.println(Thread.currentThread().getName() + " locked resB"); } } }, "Worker-1"); Thread t2 = new Thread(() -> { synchronized (resB) { System.out.println(Thread.currentThread().getName() + " locked resB"); sleep(100); synchronized (resA) { System.out.println(Thread.currentThread().getName() + " locked resA"); } } }, "Worker-2"); t1.start(); t2.start(); t1.join(); t2.join(); // never returns due to deadlock } private static void sleep(long ms) { try { Thread.sleep(ms); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } }
sleep() calls are not required for deadlock to happen — they just make it reliably reproducible. Without them, a race condition might have one thread finish before the other starts. In production, timing varies, so deadlocks are intermittent.Thread.sleep() in test code to force the timing.sleep() to force threads to acquire locks in the problematic order.Prevention Strategies That Actually Work in Production
You have three main weapon against deadlocks, each with trade-offs. Choose based on your context.
1. Lock ordering (the gold standard) Establish a strict global order for acquiring locks. If all threads acquire locks in the same order, circular wait is impossible. Use static lock objects (or an enum) to enforce the order. This is the simplest and most reliable — but only works when you control all the locks.
2. tryLock with timeout Instead of locking indefinitely, use java.util.concurrent.locks.ReentrantLock.tryLock(long timeout, TimeUnit unit). If you can't acquire all needed locks within the timeout, release everything and retry. This breaks hold-and-wait. Works when lock ordering is too complex (e.g., dynamic resources, third-party code). Downside: you need to handle the failure path and it can increase latency.
3. Lock-free data structures Use java.util.concurrent classes like ConcurrentHashMap, CopyOnWriteArrayList, or AtomicReference. These use CAS (compare-and-swap) internally and never block on locks. They eliminate deadlock entirely but constrain the operations you can perform atomically.
Which one you pick depends on your constraints: Can you enforce order globally? Then do it. Is the codebase too tangled? Use tryLock. Can you use concurrent collections? Prefer lock-free.
package io.thecodeforge.concurrent; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; import java.util.concurrent.TimeUnit; public class SafeTransfer { private final Lock lock = new ReentrantLock(); private long balance; // Lock ordering: always lock accounts in account ID order public boolean transfer(SafeTransfer to, long amount) { SafeTransfer first = this.id() < to.id() ? this : to; SafeTransfer second = this.id() < to.id() ? to : this; first.lock.lock(); try { second.lock.lock(); try { if (this.balance < amount) return false; this.balance -= amount; to.balance += amount; return true; } finally { second.lock.unlock(); } } finally { first.lock.unlock(); } } // Alternative using tryLock with timeout public boolean transferWithTimeout(SafeTransfer to, long amount, long timeout, TimeUnit unit) throws InterruptedException { if (!this.lock.tryLock(timeout, unit)) return false; try { if (!to.lock.tryLock(timeout, unit)) { return false; // release and reacquire is possible but not shown for brevity } try { if (this.balance < amount) return false; this.balance -= amount; to.balance += amount; return true; } finally { to.lock.unlock(); } } finally { this.lock.unlock(); } } private long id() { return System.identityHashCode(this); } }
- Every employee uses the same rule: grab book A before book B.
- No one ever has B and waits for A because you can't hold B without first having A and then releasing A after finishing.
- tryLock is like a librarian who says 'if you can't get the next book in 5 seconds, put everything back and start over'.
- Lock-free is like a library where every book is digital — no one ever blocks on another person.
How to Break a Deadlock Without Changing Code Order
You can't always reorder locks. When you inherit a codebase where two services lock resources in different orders, changing the acquisition order is a rewrite. The real fix is a lock timeout. Java's ReentrantLock lets you attempt a lock with a timeout. If Thread A can't acquire Lock B in 2 seconds, it releases Lock A, sleeps, and retries. This breaks the circular wait condition instantly. It's not free — lock-and-retry loops increase CPU — but it's the difference between a blocked user and a graceful degradation. Production systems use tryLock() with a timeout as their first line of defense. Never assume lock order is fixable. Assume it will break and build recovery into the locking mechanism itself.
// io.thecodeforge import java.util.concurrent.locks.ReentrantLock; import java.util.concurrent.TimeUnit; public class DeadlockBreaker { private final ReentrantLock lockA = new ReentrantLock(); private final ReentrantLock lockB = new ReentrantLock(); public void threadOne() { try { if (lockA.tryLock(2, TimeUnit.SECONDS)) { try { // Simulate work if (lockB.tryLock(2, TimeUnit.SECONDS)) { try { System.out.println("Thread 1: Got both locks"); } finally { lockB.unlock(); } } else { System.out.println("Thread 1: Could not get lock B, releasing lock A"); } } finally { lockA.unlock(); } } } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } // threadTwo() mirrors threadOne but uses same lock order }
Live-Lock: The Deceptive Cousin That Wastes CPU
You fix deadlock with timeouts and your threads stop hanging. But your CPU spikes to 100% and throughput drops. Welcome to live-lock. Threads aren't blocked — they're actively retrying the same lock acquisition in a tight loop. They're working harder than a deadlock, because they're burning cycles failing, sleeping, and retrying. Production code must distinguish between 'back off and retry' and 'back off and wait for a signal.' Never use Thread.sleep() as a backoff strategy — it's blind. Use a Condition variable or a Semaphore to wake threads when a resource becomes free. If you retry, use exponential backoff with jitter. Live-lock is harder to detect than deadlock because your threads look alive. Monitor lock acquisition attempts per second. If they're high and success rate is low, you're live-locked.
// io.thecodeforge import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.locks.ReentrantLock; public class LiveLockDetector { private final ReentrantLock lockA = new ReentrantLock(); private final ReentrantLock lockB = new ReentrantLock(); private final AtomicInteger retryCount = new AtomicInteger(0); public void workWithBackoff() { while (true) { if (lockA.tryLock()) { try { if (lockB.tryLock()) { try { System.out.println("Work done after retries: " + retryCount.get()); return; } finally { lockB.unlock(); } } } finally { lockA.unlock(); } } // Exponential backoff: 10ms, 20ms, 40ms, ... long backoff = Math.min(100, 10 * (1 << retryCount.getAndIncrement())); try { Thread.sleep(backoff + (long)(Math.random() * backoff)); } catch (InterruptedException e) { Thread.currentThread().interrupt(); return; } } } }
Testing for Deadlocks: Build a Stress Lab Before Production
You cannot trust that a unit test will catch deadlocks. They are non-deterministic — they happen under load, specific thread scheduling, or at 4 AM during a deployment. The only way to validate deadlock-free code is with a stress test that forces lock contention. Write a test that spawns dozens of threads, each acquiring locks in opposite order, and runs for 30 seconds. Use ThreadMXBean to check for deadlocks after execution. If your code survives 10,000 iterations of that torture, it's probably safe. Do not rely on static analysis alone — it produces false positives and misses interleaving bugs. The JVM's built-in deadlock detection via ThreadMXBean is your best friend in CI pipelines. Add a test that fails if a deadlock is detected. This is non-negotiable for any service with shared mutable state.
// io.thecodeforge import java.lang.management.ManagementFactory; import java.lang.management.ThreadMXBean; import java.util.concurrent.CountDownLatch; public class DeadlockStressTest { private static final Object resourceA = new Object(); private static final Object resourceB = new Object(); public void runStressTest() throws InterruptedException { ThreadMXBean bean = ManagementFactory.getThreadMXBean(); CountDownLatch latch = new CountDownLatch(1); Runnable task1 = () -> { synchronized(resourceA) { try { Thread.sleep(1); } catch (InterruptedException e) {} synchronized(resourceB) {} } }; Runnable task2 = () -> { synchronized(resourceB) { try { Thread.sleep(1); } catch (InterruptedException e) {} synchronized(resourceA) {} } }; for (int i = 0; i < 50; i++) { new Thread(task1).start(); new Thread(task2).start(); } Thread.sleep(5000); long[] deadlockedIds = bean.findDeadlockedThreads(); if (deadlockedIds != null) { throw new RuntimeException("Deadlock detected! Threads: " + deadlockedIds.length); } System.out.println("No deadlock — stress test passed"); } }
The Silent Payment Processing Outage
- Never assume resource contention without capturing a thread dump.
- Lock ordering isn't optional — it's a deployment requirement.
- Add thread dump capture (jstack <pid>) to your incident response runbook. It's the only way to confirm deadlock.
ThreadMXBean.findMonitorDeadlockedThreads() in a health check or monitoring endpoint. Log the stack trace when it returns non-null.jstack <pid> > dump_$(date +%H%M%S).txtgrep -B5 -A10 'java.lang.Thread.State: BLOCKED' dump_*.txtjcmd <pid> Thread.printcurl http://localhost:8080/actuator/threaddump (if Actuator enabled)ThreadMXBean bean = ManagementFactory.getThreadMXBean(); long[] ids = bean.findMonitorDeadlockedThreads();if (ids != null) { for (long id : ids) { ThreadInfo info = bean.getThreadInfo(id); System.err.println(info.getThreadName() + " " + info.getLockName()); } }| Strategy | When to Use | Performance Impact | Difficulty of Adoption | Risk of Incorrect Implementation |
|---|---|---|---|---|
| Lock ordering | You control all lock acquisitions | Negligible (no extra syscalls) | Medium — requires code audit | Low (if enforced with static checks) |
| tryLock with timeout | Third-party or dynamic lock acquisitions | Medium — timeout adds latency and retries | High — must design recovery logic | Medium — incorrect timeout handling can cause thread starvation |
| Lock-free data structures | Single shared resource (e.g., ConcurrentHashMap) | Very high under contention (CAS loops), low otherwise | Low — just use java.util.concurrent classes | Low — well-tested by JDK |
Key takeaways
Common mistakes to avoid
4 patternsNesting synchronized blocks without consistent ordering
Relying on synchronized as if it prevents deadlocks
Restarting the JVM without capturing a thread dump
Using ThreadMXBean detection but only logging and not alerting
Interview Questions on This Topic
What are the four necessary conditions for a deadlock?
How would you detect a deadlock in a running Java application?
jstack <pid> or kill -3 <pid> on Unix. For programmatic detection, use ThreadMXBean.findMonitorDeadlockedThreads() which returns the IDs of deadlocked threads, or null if none. You can expose this in a health check endpoint for continuous monitoring.You receive a production alert: the microservice is unresponsive, no errors in logs, CPU is normal. Walk me through your debugging process.
jstack (or jcmd Thread.print). I look for threads in BLOCKED state and check if any two are waiting on each other's locks — that confirms a deadlock. If not, I check for WAITING threads that may indicate thread pool exhaustion or missing notify(). I then analyse the locks involved, note the class and method names, and search the codebase for the lock acquisition order. The fix is either lock ordering (if I can control the order) or tryLock with timeout if the order is dynamic. During the incident, I safely restart the JVM after saving the dump. Post-mortem, I write a test that reproduces the timing and verify the fix prevents the cycle.Frequently Asked Questions
Yes. ReentrantLock provides the same mutual exclusion as synchronized. If you nest tryLock() calls without proper ordering or timeout handling, a deadlock can still occur. Use tryLock with a timeout or enforce lock ordering to prevent it.
In deadlock, threads are stuck waiting and never proceed. In livelock, threads are actively executing but making no progress — they keep trying a failed operation and retry immediately, consuming CPU but never succeeding. Livelock is detectable via high CPU usage and repeated failed attempts in logs.
Apply lock ordering: define a global order (e.g., by lock object identity or a numeric ID), and always acquire locks in that order. Alternatively, if you cannot define a consistent order, use tryLock with a timeout: attempt to acquire all locks within a timeout, and if you fail, release everything and retry.
The JVM can detect deadlocks via the ThreadMXBean API, but it does not automatically break them. The JVM will not forcibly release locks. Detection is available programmatically or via thread dump analysis, but you must implement the monitoring and recovery logic yourself.
20+ years shipping production Java in banking & fintech. Written from production experience, not tutorials.
That's Multithreading. Mark it forged?
6 min read · try the examples if you haven't