Module A-5·20 min read

Reentrant locking with Hash-stored reentry counters, lock hierarchies and consistent ordering to prevent circular waits, the watchdog pattern for automatic lock extension, and distinguishing mutex locks from counting semaphores.

A-5 — Reentrant Locks, Hierarchies, and Deadlock Prevention

Who this module is for: You have basic Redis locking working but are hitting edge cases — a function that holds a lock tries to call another function that also acquires the same lock (deadlock), or you need to acquire multiple locks without racing with another process doing the same. This module covers reentrant locks, lock hierarchies, and the patterns that prevent deadlock at scale.

The Reentrant Lock Problem

A basic Redis lock (SET NX PX) is not reentrant. If a function holds lock:resource and calls a subroutine that also tries to acquire lock:resource, the subroutine blocks forever — it is waiting for itself to release a lock it holds.

processOrder(orderId)
  → acquireLock('order:1001')     ✓ acquired
  → validateInventory(orderId)
      → acquireLock('order:1001') ✗ blocks forever (deadlock)

In single-threaded Node.js, this is actually a deadlock: the blocking await acquireLock prevents the event loop from running, which means the outer function can never release the lock, which means the inner acquireLock can never unblock.

Reentrant Lock Implementation

A reentrant lock tracks the holder's identity and a reentry count. The same holder can acquire the lock multiple times; it is released only when the reentry count reaches zero.

Use a Redis Hash to store both the holder token and the count:

lua
-- KEYS[1] = lock key
-- ARGV[1] = holder UUID
-- ARGV[2] = TTL in milliseconds

-- Acquire or reenter
local current_holder = redis.call('HGET', KEYS[1], 'holder')
if current_holder == false then
  -- Lock is free: acquire it
  redis.call('HSET', KEYS[1], 'holder', ARGV[1], 'count', 1)
  redis.call('PEXPIRE', KEYS[1], ARGV[2])
  return 1  -- acquired
elseif current_holder == ARGV[1] then
  -- Same holder: reenter
  redis.call('HINCRBY', KEYS[1], 'count', 1)
  redis.call('PEXPIRE', KEYS[1], ARGV[2])  -- reset TTL
  return redis.call('HGET', KEYS[1], 'count')  -- return current count
else
  -- Different holder: cannot acquire
  return 0
end

lua
-- KEYS[1] = lock key
-- ARGV[1] = holder UUID

-- Release (decrement count; delete when count reaches 0)
local current_holder = redis.call('HGET', KEYS[1], 'holder')
if current_holder ~= ARGV[1] then
  return -1  -- not the holder; error
end
local count = redis.call('HINCRBY', KEYS[1], 'count', -1)
if count <= 0 then
  redis.call('DEL', KEYS[1])
  return 0  -- fully released
end
return count  -- still held (reentry count > 0)

typescript
const REENTRANT_ACQUIRE_SHA = await redis.script('LOAD', reentrantAcquireScript);
const REENTRANT_RELEASE_SHA = await redis.script('LOAD', reentrantReleaseScript);

// In Node.js (single-threaded): use a per-async-context token stored in AsyncLocalStorage
import { AsyncLocalStorage } from 'async_hooks';

const lockStore = new AsyncLocalStorage<{ token: string; locks: Set<string> }>();

async function acquireReentrantLock(resource: string, ttlMs: number): Promise<boolean> {
  const context = lockStore.getStore();
  if (!context) throw new Error('Must run inside lockStore.run()');

  const result = await redis.evalsha(
    REENTRANT_ACQUIRE_SHA, 1,
    `lock:${resource}`,
    context.token,
    String(ttlMs)
  ) as number;

  if (result >= 1) {
    context.locks.add(resource);
    return true;
  }
  return false;
}

async function releaseReentrantLock(resource: string): Promise<void> {
  const context = lockStore.getStore();
  if (!context) throw new Error('Must run inside lockStore.run()');

  const result = await redis.evalsha(
    REENTRANT_RELEASE_SHA, 1,
    `lock:${resource}`,
    context.token
  ) as number;

  if (result === 0) {
    context.locks.delete(resource);
  }
}

// Usage: wrap the entire request/job in lockStore.run()
async function handleRequest(requestId: string) {
  const token = randomUUID();
  await lockStore.run({ token, locks: new Set() }, async () => {
    await processOrder('1001');  // can call acquireReentrantLock('order:1001') inside
  });
}

Lock Hierarchies and Deadlock

Deadlock occurs when two processes each hold a lock the other needs:

Process A holds: lock:account:1001
Process B holds: lock:account:1002
Process A waits: lock:account:1002  (held by B)
Process B waits: lock:account:1001  (held by A)
→ Deadlock: neither can proceed

The Consistent Ordering Solution

If all processes acquire multiple locks in the same globally consistent order, circular waits are impossible.

Rule: Always acquire locks in ascending order of a canonical key (alphabetical order, numeric ID order, or a predetermined priority order).

typescript
async function transferBetweenAccounts(
  fromAccountId: string,
  toAccountId: string,
  amount: number
) {
  // Always acquire in sorted order to prevent deadlock
  const sortedIds = [fromAccountId, toAccountId].sort();

  const lock1 = await acquireLock(`account:${sortedIds[0]}`, 10000);
  if (!lock1) throw new Error('Could not acquire lock 1');

  const lock2 = await acquireLock(`account:${sortedIds[1]}`, 10000);
  if (!lock2) {
    await lock1.release();
    throw new Error('Could not acquire lock 2');
  }

  try {
    await executeTransfer(fromAccountId, toAccountId, amount);
  } finally {
    await lock2.release();
    await lock1.release();
  }
}

Regardless of which account is "from" and which is "to", both processes always acquire account:1001 before account:1002. If Process A and Process B are both transferring between 1001 and 1002, they will queue up — one waits for the other, no circular wait.

The ordering must be globally consistent across all lock sites. A mix of orderings (some code sorts alphabetically, other code sorts by timestamp) defeats the pattern.

Lock Timeout as Deadlock Prevention

A simpler approach: set an acquisition timeout. If you cannot acquire a lock within N milliseconds, fail fast instead of waiting indefinitely.

typescript
async function acquireLockWithTimeout(
  resource: string,
  ttlMs: number,
  timeoutMs: number
): Promise<Lock | null> {
  const deadline = Date.now() + timeoutMs;

  while (Date.now() < deadline) {
    const result = await redis.set(`lock:${resource}`, randomUUID(), 'NX', 'PX', ttlMs);
    if (result === 'OK') return buildLock(resource);

    const remaining = deadline - Date.now();
    if (remaining <= 0) break;

    // Wait a short interval before retrying
    await new Promise(r => setTimeout(r, Math.min(50, remaining)));
  }

  return null;  // Timeout — potential deadlock avoided by giving up
}

Lock timeouts convert deadlocks into timeouts — the operation fails with an error instead of hanging forever. This is the correct production behaviour: fail fast and let the caller retry or report an error.

Mutex vs Semaphore

The locks covered so far are mutexes — exclusive access, one holder at a time. Sometimes you need to allow N concurrent holders (e.g., limit to 5 concurrent database connections from a particular service).

A counting semaphore in Redis:

typescript
async function acquireSemaphore(
  resource: string,
  maxConcurrent: number,
  ttlMs: number
): Promise<string | null> {
  const token = randomUUID();
  const key = `semaphore:${resource}`;

  const semaphoreAcquireScript = `
    local count = redis.call('SCARD', KEYS[1])
    if count < tonumber(ARGV[2]) then
      redis.call('SADD', KEYS[1], ARGV[1])
      redis.call('PEXPIRE', KEYS[1], ARGV[3])
      return 1
    end
    return 0
  `;

  const result = await redis.eval(
    semaphoreAcquireScript, 1,
    key, token, String(maxConcurrent), String(ttlMs)
  );

  return result === 1 ? token : null;
}

async function releaseSemaphore(resource: string, token: string): Promise<void> {
  await redis.srem(`semaphore:${resource}`, token);
}

Limitation: The Set-based semaphore has the same GC-pause vulnerability as single-instance locks — a holder that pauses longer than the TTL loses its slot. For strict concurrency limits with crash recovery, combine with periodic TTL renewal.

The Watchdog Pattern for Long-Held Locks

When a lock must be held for an indeterminate duration (e.g., while streaming a large file), use a watchdog that extends the lock while work is ongoing:

typescript
class WatchdogLock {
  private watchdog: NodeJS.Timeout | null = null;

  constructor(
    private key: string,
    private value: string,
    private ttlMs: number
  ) {}

  startWatchdog() {
    const renewInterval = this.ttlMs / 3;
    this.watchdog = setInterval(async () => {
      const extended = await extendLock(this.key, this.value, this.ttlMs);
      if (!extended) {
        console.error(`Lock ${this.key} could not be extended — lock was lost`);
        clearInterval(this.watchdog!);
        this.watchdog = null;
        // Emit an event/signal to abort the work
      }
    }, renewInterval);
  }

  stopWatchdog() {
    if (this.watchdog) {
      clearInterval(this.watchdog);
      this.watchdog = null;
    }
  }

  async release() {
    this.stopWatchdog();
    return releaseLock(this.key, this.value);
  }
}

The watchdog ensures the lock TTL is continuously renewed while the holder is alive. If the holder crashes, the watchdog stops, the lock expires naturally, and another process can acquire it.

Summary

Reentrant locks — use a Hash storing holder UUID + reentry count; same holder can reacquire; fully released when count reaches zero
AsyncLocalStorage in Node.js carries the holder token across async context for reentrant lock identity
Deadlock from multiple locks — prevented by consistent ordering: always acquire locks sorted by a canonical key across all code paths
Lock timeouts — set an acquisition timeout to convert potential deadlocks into fast failures
Mutex vs Semaphore — a counting semaphore (Set-based) allows N concurrent holders for rate-limiting concurrency
Watchdog pattern — background timer extends lock TTL while holder is alive; lock expires on crash without explicit release

Next: A-6 — The SupraScan Architecture: Coordinating 10+ Concurrent Scanner Instances — the real-world distributed coordination problem from building a production blockchain indexer at 2,000+ TPS.

PreviousModule A-4: Redlock: The Algorithm, Its Guarantees, and Its Critics Next Module A-6: The SupraScan Architecture: Coordinating 10+ Concurrent Scanner Instances