Back to the path
Level 5 advanced 27 min #concurrency#thread-safety#locking#atomics#cas#deadlock#lld#interview

Concurrency for LLD: Race Conditions, Locks, and Thread-Safe Design

A working engineer's guide to making shared mutable state safe in LLD interviews: spotting race conditions, choosing locking granularity, using atomics and CAS loops correctly, avoiding deadlock with lock ordering, and defending throughput — with real, compilable locking code in TypeScript, Python, and C++, plus memory-ordering and thread-safety details.

Why concurrency shows up in LLD interviews

Most LLD prompts — a rate limiter, an in-memory key-value store, a parking lot, a bounded inventory, an LRU cache, a connection pool — are secretly concurrency problems. The interviewer rarely says “make it thread-safe” up front. They let you build the single-threaded version, then ask the killer follow-up:

“Two threads call book() for the last seat at the same time. What happens?”

The single-threaded design that earned you the SDE-1 offer will double-book. The SSE bar is recognizing where shared mutable state lives, articulating the exact interleaving that breaks it, and fixing it with the minimum synchronization that preserves correctness without serializing your whole system.

This lesson gives you a precise mental model and drop-in locking code in all three languages. The goal is not to memorize synchronized — it’s to reason about what you’re protecting, how much of it, and what it costs.

Key idea

The entire discipline reduces to one sentence: a critical section is a sequence of operations that must appear atomic with respect to other threads. Find every critical section, protect each with the cheapest correct mechanism, and order your locks consistently. Everything below is an elaboration of that sentence.

Race conditions: what actually breaks

A race condition exists when the correctness of a program depends on the relative timing of threads. The canonical example is a non-atomic read-modify-write.

Consider counter += 1. It looks like one operation. It is three:

   Thread A                Thread B
   -----------             -----------
1. read counter (=5)
2.                         read counter (=5)
3. compute 5+1 = 6
4.                         compute 5+1 = 6
5. write counter = 6
6.                         write counter = 6   <- lost update! should be 7

Both threads read 5, both write 6. One increment vanished. This is a lost update, the most common race in LLD designs. The same shape appears as:

  • Check-then-actif (seats > 0) seats--; Two threads pass the check, both decrement, you oversell.
  • Read-modify-write — counters, balances, reference counts.
  • Lazy initializationif (instance == null) instance = new X(); two threads each construct one.

In an interview, do this: when asked “is this safe?”, draw the interleaving table above on the whiteboard. Naming the exact step where two threads overlap is what separates “I think we need a lock” from a senior answer.

The critical section

The fix is to make the read-modify-write mutually exclusive: at most one thread inside the critical section at a time.

        +-----------------------------+
        |   acquire lock              |
        |   --- CRITICAL SECTION ---  |
        |   read  -> modify -> write  |   <- only ONE thread here
        |   --------------------------|
        |   release lock              |
        +-----------------------------+

The critical section should be as small as correctness allows — wrap the shared-state mutation, not the surrounding I/O, logging, or computation that uses only local data.

Mutex / Lock: the workhorse

A mutex (mutual-exclusion lock) is the default tool. Here is the same safe counter in all three languages. Note how each language expresses “hold the lock for exactly this block.”

// Node runs JS on a single thread, so a plain counter is already safe.
// But Worker Threads + SharedArrayBuffer DO share memory. Use Atomics,
// the canonical TS concurrency primitive over a SAB.
const sab = new SharedArrayBuffer(4);
const counter = new Int32Array(sab); // shared across workers

function increment(): void {
// Atomics.add is a single indivisible read-modify-write.
Atomics.add(counter, 0, 1);
}

function read(): number {
return Atomics.load(counter, 0);
}

// For mutual exclusion over a COMPOUND critical section (more than one
// add), build a simple parking spinlock on Atomics. This is a
// spinlock-with-park, NOT a fair mutex. Atomics.wait blocks the calling
// thread; it throws only where the agent cannot block (a BROWSER main
// thread). Node's main thread is allowed to block, so it does NOT throw
// there — but blocking the main thread is still a bad idea, so confine
// this spinlock to Workers (or guard the call) regardless.
class Mutex {
constructor(private flag: Int32Array) {} // length-1 Int32Array in a SAB

lock(): void {
  // compareExchange(flag, 0, expected=0, replacement=1) returns the OLD
  // value. If it was 0 we won the lock and stop.
  while (Atomics.compareExchange(this.flag, 0, 0, 1) !== 0) {
    // Park until the value is no longer 1. If a releaser already set it
    // to 0, Atomics.wait returns 'not-equal' immediately and we re-CAS.
    // So there is no lost wakeup, but under contention we may spin.
    Atomics.wait(this.flag, 0, 1);
  }
}

unlock(): void {
  Atomics.store(this.flag, 0, 0);
  Atomics.notify(this.flag, 0, 1); // wake one parked waiter
}
}

Three idioms to internalize, one per language:

  • C++std::lock_guard / std::scoped_lock (RAII). The destructor releases the lock, so it survives early returns and exceptions. Never call mtx.lock() / mtx.unlock() by hand.
  • Pythonwith self._lock:. Same RAII spirit via the context manager. threading.Lock is the basic non-reentrant lock; use threading.RLock if the same thread must re-acquire. (Typing note: threading.Lock is a factory function, not a class. Annotations like list[threading.Lock] work at runtime, but strict mypy may flag using it as a type — annotate with threading.Lock and add # type: ignore if your checker complains.)
  • TypeScript/Java mental model — JS execution is single-threaded, so single-runtime code needs no lock; real shared memory only appears with Worker Threads + SharedArrayBuffer, where Atomics is the primitive. In Java you’d say synchronized(this) or a ReentrantLock. Say this distinction out loud in an interview — claiming a Node service “needs a mutex” for in-process state is a red flag.
Watch out

Always release on every path. A lock acquired but not released (because of an early return, a thrown exception, or a break) is a permanent deadlock. RAII (lock_guard) and with/finally exist precisely to make release automatic. Hand-rolled lock/unlock pairs are the #1 source of leaked locks.

Atomics: when you don’t need a full lock

If your critical section is a single read-modify-write on a primitive — an increment, a flag flip, a compare-and-swap — an atomic is faster than a mutex.

Key idea

Scope the “lock-free” claim carefully. The RMW atomicsAtomics.add/load/store, a successful CAS, fetch_add — are lock-free and typically compile to a single CPU instruction (e.g. LOCK XADD); they don’t acquire an OS lock or park the thread. But Atomics.wait and a contended futex do block — in fact the Mutex above is built on Atomics.wait parking the thread. So “never blocks” applies to the RMW family, not to every function in the Atomics namespace.

// A lock-free monotonic ID generator across Worker Threads.
const idCell = new Int32Array(new SharedArrayBuffer(4));

function nextId(): number {
// Atomics.add returns the value BEFORE the add, like fetch_add.
return Atomics.add(idCell, 0, 1);
}

Compare-and-swap, the right way

CAS (compare-and-swap) is the building block of lock-free algorithms, but it is almost never used as a one-shot. Its power comes from a retry loop: read the current value, compute a new value from it, and atomically swap only if nobody changed it in between. The crucial detail in C++ is that compare_exchange_* updates expected by reference on failure — it writes back the value it actually saw, so the next loop iteration recomputes from fresh data.

// CAS retry loop: atomically multiply the shared cell by 2, lock-free.
function doubleValue(cell: Int32Array): void {
while (true) {
  const observed = Atomics.load(cell, 0);
  const next = observed * 2;
  // compareExchange returns the OLD value. If it equals 'observed' the
  // swap succeeded; otherwise someone raced us — loop and retry.
  if (Atomics.compareExchange(cell, 0, observed, next) === observed) return;
}
}

Atomic vs. mutex — the decision rule:

Use an atomic when…Use a mutex when…
A single variable, single RMW opMultiple variables must change together
Counters, flags, CAS-based structuresA compound invariant must hold (e.g. “balance ≥ 0 AND log updated”)
You want lock-free / wait-free progressThe critical section calls into other code
Contention is high and ops are tinyCorrectness needs a consistent multi-field snapshot
Common pitfall

You cannot compose two atomics into one atomic operation. if (atomicSeats.load() > 0) atomicSeats.fetch_sub(1); still races — another thread can decrement between the load and the subtract. To make “check N seats then decrement” atomic you need either a single CAS loop (read available, compute available−n, CAS) or a mutex. Atomicity does not compose for free.

A realistic LLD example: a thread-safe seat-booking inventory

This is the shape of the question you’ll actually get. Requirements: book(n) succeeds only if n seats remain, and never oversells under concurrency. The invariant is 0 <= available <= capacity. Because the check and the decrement together form one invariant, this needs a mutex or a CAS loop, not a bare atomic.

// Single-runtime Node: synchronous => already serialized, no lock needed.
// Across Worker Threads, guard with the Atomics-based Mutex from above.
class Inventory {
private available: number;
constructor(private capacity: number) { this.available = capacity; }

// Synchronous => atomic in a single JS runtime. Document this assumption.
book(n: number): boolean {
  if (n <= 0 || this.available < n) return false; // check
  this.available -= n;                            // act
  return true;
}
}

// CAS-loop variant proving "check-then-act" needs no mutex (across
// Workers over a SAB). The invariant 'available - n >= 0' is enforced
// atomically by only committing when the CAS observes the same value.
function bookCas(cell: Int32Array, n: number): boolean {
if (n <= 0) return false;
while (true) {
  const avail = Atomics.load(cell, 0);
  if (avail < n) return false;                       // not enough seats
  if (Atomics.compareExchange(cell, 0, avail, avail - n) === avail) {
    return true;                                     // committed atomically
  }
  // someone changed 'available' between load and CAS — retry
}
}

The crucial design point: the if and the -= are inside the same lock acquisition. A common buggy “fix” locks only the decrement, leaving the check outside — that still oversells. State the invariant, then make sure the entire check-then-act lives in one critical section. The CAS-loop variant shows the same invariant enforced without a mutex: it only commits the decrement when compareExchange confirms available is unchanged since the check.

Locking granularity: correctness vs. throughput

A single global lock is always correct and often terrible. If your in-memory store has one lock for the whole map, every get/put serializes — your multi-core machine runs at single-core speed. The art is granular locking: hold the smallest lock for the shortest time.

   Coarse-grained (one big lock)        Fine-grained (lock striping)
   +--------------------------+         +------+ +------+ +------+
   |        [ LOCK ]          |         |lock0 | |lock1 | |lock2 |
   |  bucket0 bucket1 bucket2 |         |bkt0  | |bkt1  | |bkt2  |
   +--------------------------+         +------+ +------+ +------+
   simple, correct, low concurrency     N locks -> N concurrent writers

Lock striping: shard the data into N partitions, each with its own lock. A key hashes to a stripe; operations on different stripes proceed in parallel. This is exactly how Java’s old ConcurrentHashMap (segments) worked.

// Stripe selection. JS bitwise ops are signed 32-bit, so '>>> 0'
// forces the hash unsigned before '%'. (See the cross-language note below
// — do NOT copy a bare 'hash % n' into a language with truncated modulo.)
function stripeFor(key: string, stripes: number): number {
let h = 0;
for (const ch of key) h = (Math.imul(31, h) + ch.charCodeAt(0)) | 0;
return (h >>> 0) % stripes; // unsigned, in [0, stripes)
}
Watch out

Cross-language modulo asymmetry. All three implementations index by hash % stripes, and all three are correct — but for different reasons. C++ std::hash returns unsigned size_t. JS forces unsigned with >>> 0. Python’s hash() can be negative, but Python’s floored modulo still yields [0, stripes). Do not port a bare hash % n into C or older JS bitwise contexts, where modulo truncates toward zero and a negative hash gives a negative, out-of-bounds index. Normalize to unsigned explicitly when in doubt.

Read-heavy workloads deserve a separate tool: a read-write lock (std::shared_mutex in C++17, ReentrantReadWriteLock in Java; Python’s stdlib has none — you’d build one or use a third-party). Many readers share the lock concurrently; a writer takes it exclusively. Use it when reads vastly outnumber writes (a config cache, a routing table).

Tip

Granularity is a trade-off you should verbalize, not a default. Finer locks → more parallelism but more memory, more complexity, and more deadlock surface. In an interview, start with one coarse lock for correctness, then say: “If profiling shows contention here, I’d stripe by key into N partitions” — showing you optimize measured contention, not imagined contention.

Deadlock: the four conditions and how to break one

A deadlock is a cycle of threads each holding a lock the next one wants. Classic case: thread A locks account 1 then 2; thread B locks account 2 then 1.

   Thread A holds L1, wants L2  --------+
        ^                               |
        | wants L1                      v  holds L2
        +-------------------------- Thread B
   Cyclic wait => neither proceeds => deadlock

Deadlock requires all four Coffman conditions simultaneously: mutual exclusion, hold-and-wait, no preemption, and circular wait. Break any one and you’re safe. The two practical techniques:

1. Lock ordering (break circular wait) — the default answer

Always acquire multiple locks in a global, consistent order (e.g. by ascending account ID). If everyone locks the lower ID first, no cycle can form.

// Order by a stable id so every transfer locks in the same sequence.
// (Conceptual — each Account owns the Atomics Mutex from earlier.)
interface Account { id: number; balance: number; lock(): void; unlock(): void; }

function transfer(a: Account, b: Account, amt: number): void {
// Always lock the lower id first => no cycle can form.
const [first, second] = a.id < b.id ? [a, b] : [b, a];
first.lock();
try {
  second.lock();
  try {
    a.balance -= amt;
    b.balance += amt;
  } finally {
    second.unlock();
  }
} finally {
  first.unlock();
}
}
Key idea

C++ gives you std::scoped_lock(a.mtx, b.mtx) (and the older std::lock(...)), which acquires multiple mutexes using a back-off algorithm that cannot deadlock regardless of order. Mention it in interviews — but also be able to explain the manual ascending-ID ordering, because most languages don’t have that helper.

2. The other three techniques (know them, but lead with ordering)

  • tryLock with timeout / back-off (break hold-and-wait / no preemption): try to grab the second lock; if it fails within a timeout, release the first and retry. Prevents indefinite waiting but can livelock — add randomized back-off.
  • Lock coarsening (break circular wait trivially): use a single lock for the whole transfer subsystem. Correct, simple, kills concurrency — fine for a low-throughput path.
  • Lock-free / no mutual exclusion: redesign so the operation is a single CAS (break mutual exclusion). Hardest, but eliminates the category.

In an interview, say this: “I’ll prevent deadlock by acquiring locks in a global order keyed on a stable ID. If a path can’t use a fixed order, I’d fall back to tryLock with timeout and back-off.” That one sentence demonstrates you know both the default and the escape hatch.

Putting it together: an interview checklist

When the follow-up “is this thread-safe?” lands, walk this sequence out loud:

  1. Locate shared mutable state. Which fields are read/written by more than one thread?
  2. Identify each critical section. Which read-modify-writes or check-then-acts must appear atomic? State the invariant.
  3. Pick the cheapest correct mechanism. Single primitive RMW → atomic/CAS. Compound invariant or multi-field → mutex. Read-heavy → read-write lock.
  4. Right-size granularity. Start coarse for correctness; stripe by key only where you can justify measured contention.
  5. Order locks by a global key to kill circular wait; mention tryLock/scoped_lock as alternatives.
  6. Guarantee release on every path with RAII / with / finally.

That checklist is the deliverable. The locking primitives are just vocabulary; the judgment about where and how much is what gets you the SSE-level signal.

Complexity and cost summary

MechanismTime per opSpaceConcurrencyWhen
Atomic RMW (fetch_add, CAS)O(1), 1 instr (CAS loop: O(retries))O(1)Lock-free, scales wellSingle primitive, hot path
Single mutexO(1) uncontended; serializes under contentionO(1)None across the guarded regionCompound invariant, simple
Lock striping (N stripes)O(1) + hashingO(N) locksUp to N concurrent writersHot map, measured contention
Read-write lockO(1); readers parallel, writers exclusiveO(1)High for read-heavyReads ≫ writes

CAS loops are amortized O(1) but unbounded in the worst case under heavy contention (each failed CAS retries) — that’s the price of being lock-free. A mutex bounds work per thread but serializes; the right choice depends on contention and critical-section size, which is exactly the trade-off the interviewer wants to hear you reason about.

Assessment

Pass mark: 70% on the concept questions unlocks the next lesson.

1. A teammate replaces a mutex-guarded seat counter with: `if (atomicSeats.load() > 0) atomicSeats.fetch_sub(1);` claiming it is now lock-free and correct. What is the actual outcome?

2. In the C++ CAS loop `do { desired = f(expected); } while (!x.compare_exchange_weak(expected, desired));`, why is passing `expected` by reference essential?

3. Which statements about `Atomics` in TypeScript/Node (over a SharedArrayBuffer) are TRUE? Select all that apply. (select all)

4. Why is `id_counter.fetch_add(1, std::memory_order_relaxed)` safe for a monotonic ID generator, even though the default would be seq_cst?

Design problem 5

Design a thread-safe bounded connection pool: `acquire()` returns an idle connection (blocking or failing if none are free) and `release(conn)` returns one to the pool. Capacity is fixed at N. Specify the shared state, the critical sections and their invariants, the synchronization primitive(s) you'd use (and why), how you avoid deadlock if a caller holds two pooled resources, and the throughput trade-off of your locking granularity. Provide a sketch in one language.