Module A-12·20 min read

Active-active (CRDT-based) vs active-passive multi-region Redis, Redis Enterprise geo-replication, conflict resolution strategies, latency-vs-consistency trade-offs, and when global distribution is the wrong answer.

A-12 — Multi-Region Redis: Active-Active and Geo-Replication

Who this module is for: You serve users across multiple geographic regions and need Redis to be close to each user — low-latency reads and writes from any region. This module covers the two multi-region Redis architectures (active-passive and active-active), the CRDT-based conflict resolution that enables active-active, and when global Redis distribution creates more problems than it solves.

Why Multi-Region Redis

A Redis instance in us-east-1 adds 120ms+ of round-trip time for a user in ap-southeast-1 — completely negating Redis's sub-millisecond advantage over a database. For globally distributed applications, you need Redis deployed close to each user group.

Two architectural options:

Active-Passive:  One region writes, other regions read (with replication lag)
Active-Active:   All regions write; changes propagate and conflicts are resolved

Active-Passive: Regional Read Replicas

The simplest multi-region setup: the primary Redis is in one region; replicas in other regions serve reads with some lag.

us-east-1 (Primary): accepts all writes
      │
      │ async replication (100–200ms cross-region latency)
      ▼
eu-west-1 (Replica): serves reads (100–200ms behind primary)
ap-southeast-1 (Replica): serves reads (150–250ms behind primary)

Read path: Applications in eu-west-1 read from the local replica with < 1ms latency.

Write path: All writes go to us-east-1. An API call from ap-southeast-1 that writes data adds 150–250ms of round-trip time to reach the primary. Unacceptable for write-heavy workloads.

Failover: If the primary fails, you manually promote a replica using REPLICAOF NO ONE and update your application's connection strings. Sentinel can automate this within a single region but not across regions (network latency makes cross-region failure detection unreliable).

When active-passive is appropriate:

Read-heavy workloads where most data is read many times and written rarely
Data that is written in one primary region (user-generated content flow is unidirectional)
Cache workloads where cross-region write latency is acceptable

Active-Active: Every Region Writes Locally

In active-active mode, each regional Redis instance accepts writes. Changes propagate to other regions asynchronously. When two regions write to the same key simultaneously, conflict resolution determines the outcome.

us-east-1 (Primary 1): accepts writes from US users
eu-west-1 (Primary 2): accepts writes from EU users
ap-southeast-1 (Primary 3): accepts writes from APAC users
          │           │           │
          └───────────┴───────────┘
              bidirectional replication
              (100–300ms cross-region)

Write path: Each region writes locally with < 1ms latency. The write propagates to other regions asynchronously.

Conflict scenario:

T=0ms:  US user sets key "product:999:stock" to 5
T=0ms:  EU user simultaneously sets key "product:999:stock" to 3
T=200ms: US change arrives in EU, EU change arrives in US
→ What is the final value? 5? 3? Something else?

CRDTs: Conflict-Free Replicated Data Types

CRDTs (Conflict-Free Replicated Data Types) are data structures designed so that concurrent operations from different replicas can be merged without conflicts — regardless of order.

Redis Enterprise Geo-Distribution (and Redis Cloud) implements CRDTs for Redis data types:

Redis Type	CRDT Semantics
String	Last-Write-Wins (LWW) based on logical timestamp
Counter (INCR)	All increments are summed across regions
Set	All SETs are merged; DELs use timestamps
Sorted Set	All ZADDs are merged; conflicts resolved by LWW
Hash	Field-level LWW — different fields can come from different regions
List	Append-only merging; ordering by logical timestamp

Last-Write-Wins (LWW)

The write with the highest logical timestamp (Lamport clock or hybrid logical clock) wins. For String types:

US at T=1000: SET product:999:stock "5"
EU at T=1001: SET product:999:stock "3"   (slightly later timestamp)
→ EU's write wins: final value = "3"

LWW is simple but has a failure mode: if two writes happen at nearly the same time (within clock synchronisation tolerance), the winner is determined by clock drift — arbitrary from the application's perspective.

Counter Convergence

For INCR/DECR, CRDTs sum all increments from all regions:

US: INCRBY inventory 100
EU: INCRBY inventory 50
APAC: DECRBY inventory 30
→ Convergent value: 100 + 50 - 30 = 120 (regardless of order or timing)

This is the correct semantics for distributed counters — page views, inventory adjustments, score increments.

Set Merge Semantics

For Sets, the CRDT merges all adds and respects the "observed-remove" rule — a delete only removes adds that the deleting replica has observed:

US: SADD tags "redis"       → {redis}
EU: SADD tags "postgres"    → {postgres}
(changes propagate)
Final: {redis, postgres}    ← union

US: SREM tags "redis"       → removes the "redis" add that US observed
EU: SADD tags "redis"       → adds "redis" again simultaneously
(changes propagate)
Final: {redis, postgres}    ← EU's add survives because US only removed its own add

This can produce unintuitive results (deletes don't propagate as expected), but it ensures convergence.

Redis Enterprise vs Redis OSS for Active-Active

Open-source Redis does not natively support active-active geo-replication. The replication system is designed for primary-replica, not peer-to-peer.

Redis Enterprise (commercial product from Redis Ltd.) provides:

Active-Active geo-distribution with CRDT semantics
Automatic conflict resolution
Global keyspace — all regions share the same logical database
WAN-optimised replication (delta compression, bandwidth throttling)

Redis Cloud (managed Redis Enterprise) is the SaaS offering.

Alternatives for OSS Redis:

Roshi (Twitter's approach): application-layer CRDT using Sorted Sets with timestamp scores; reads perform a merge of multiple region results
Application-level coordination: accept that writes go to one authoritative region and reads may be stale in other regions
Consistent hashing + per-region primary: different keys are "owned" by different regions; cross-region reads accept the latency

The Consistency Trade-off

Active-active replication is eventually consistent. Between when a write is applied in one region and when it propagates to all others (100–300ms for cross-region), different users see different values:

US user writes: SET username:1001 "Jatin"
150ms later, EU user reads: GET username:1001
→ EU replica: "OldName"  (replication hasn't arrived yet)

For some use cases this is acceptable:

Product recommendations (briefly stale is fine)
Feature flags (a user in EU and a user in US seeing different states for 200ms is acceptable)
Leaderboards (eventual consistency is expected)

For some it is not:

Account balances (EU and US must agree on the balance)
Inventory counts (two regions cannot both sell the last item)
Sessions (a session created in the US must be immediately valid in EU)

For strong consistency across regions: use a database with cross-region transactions (CockroachDB, Spanner, YugabyteDB), not Redis.

Practical Patterns for Multi-Region Redis Without Active-Active

Regional Primary with Global Read Replicas

typescript
// Each region has its own Redis primary for local writes
// Plus read replicas of all other regions' primaries

// When user writes (US user):
await usRedis.set(`user:${id}:profile`, data);

// When user reads in EU (accept small lag):
const profile = await euRedis.get(`user:${id}:profile`);
// Falls back to US primary if EU replica hasn't received the write
if (!profile) {
  const profileFromUS = await usRedis.get(`user:${id}:profile`);
  return profileFromUS;
}

Sticky Sessions by Region

Route each user's requests to their "home region" for consistent reads and writes. Use geolocation at the CDN/load balancer layer:

US users → us-east-1 Redis (reads and writes)
EU users → eu-west-1 Redis (reads and writes)
# Cross-region migration: replicate with lag; brief inconsistency on region change

Read-Your-Own-Writes via Sticky Routing

After a write, route subsequent reads to the same region (where the write is definitely present) for the next few seconds:

typescript
async function write(key: string, value: string, userId: string) {
  await redis.set(key, value);
  // Tag this user as "just wrote" for 5 seconds
  await redis.set(`write-tag:${userId}`, '1', 'EX', 5);
}

async function read(key: string, userId: string) {
  const justWrote = await redis.exists(`write-tag:${userId}`);
  if (justWrote) {
    // Route to primary to read our own write
    return primaryRedis.get(key);
  }
  // Route to local replica (possibly stale, but user hasn't written recently)
  return localReplicaRedis.get(key);
}

Summary

Active-passive: one primary region, read replicas in other regions — low write latency only in the primary region
Active-active: every region writes locally — requires CRDT conflict resolution; supported by Redis Enterprise/Cloud, not OSS Redis
CRDTs resolve conflicts via: LWW for Strings, sum-all-increments for counters, merge for Sets, field-level LWW for Hashes
Active-active is eventually consistent — writes propagate with 100–300ms cross-region lag
Do not use Redis active-active for strong consistency requirements (account balances, inventory) — use a transactional database
OSS Redis alternatives: regional primaries + cross-region read replicas, sticky user routing, application-layer CRDT patterns
The question "should I use multi-region Redis?" often has the answer "no" — regional cache with fallback is simpler and correct for most use cases

Next: A-13 — Disaster Recovery, Backup, and Point-in-Time Restore — RDB backup scheduling, AOF log shipping, recovery runbooks, and testing restore procedures before you need them.

PreviousModule A-11: Gossip Protocol and Network Partition Handling Next Module A-13: Disaster Recovery, Backup, and Point-in-Time Restore