Back/Module 3 Write-Ahead Logging: Durability, Replication, and the Price of Every Write
Module 3·44 min read

Every byte you write passes through WAL. Understanding WAL is understanding your write amplification.

Module 3 — Write-Ahead Logging: Durability, Replication, and the Price of Every Write

What this module covers: The WAL mechanism from first principles — why it exists, what is physically written, how checkpoints and background writes interact, how streaming replication flows from WAL, and the production consequences of misconfiguring any of it. By the end, you will be able to calculate write amplification for any workload and diagnose WAL-related performance problems from pg_stat_* views alone.


Why WAL Exists: The Durability Problem

Consider what happens when you commit a transaction in a naive database implementation.

The data must eventually reach disk. Disk writes are expensive — an 8KB page write on a spinning disk takes 5–10ms. A modern OLTP system might commit thousands of transactions per second. If every commit required flushing all modified heap pages to disk synchronously, throughput would collapse.

But you cannot skip the flush. If the system crashes before modified pages reach disk, those committed transactions are lost. That violates Durability — the D in ACID.

The naive options are both unacceptable:

  • Flush heap pages on every commit → terrible write throughput
  • Don't flush → committed data lost on crash

WAL is the solution to this dilemma.

Instead of flushing heap pages on commit, Postgres flushes a compact sequential record of what changed. This record — the Write-Ahead Log — is small, sequential, and fast to write. Heap pages can be flushed lazily in the background.

On crash recovery, Postgres replays the WAL to reconstruct any committed changes that hadn't yet made it to heap pages. The WAL is the authoritative record of what happened. The heap is a derived, materialized form of that record.

This is the core invariant of WAL: a transaction's WAL record must be on disk before the commit returns to the client. The heap page can wait. The WAL cannot.


Physical Structure of the WAL

Segments, Pages, and Records

WAL is stored in $PGDATA/pg_wal/ as a series of segment files, each 16MB by default (configurable at initdb time via --wal-segsize).

$PGDATA/pg_wal/
├── 000000010000000000000001
├── 000000010000000000000002
├── 000000010000000000000003
└── ...

The filename encodes three components in hexadecimal:

  • Timeline ID (00000001) — identifies the database history branch (changes after PITR recovery)
  • Segment high bits (00000000) — upper 32 bits of the segment number
  • Segment low bits (00000001) — lower 32 bits of the segment number

Each 16MB segment is divided into 8KB pages (matching the heap page size). Each page has a header. Within pages, WAL is written as a stream of variable-length WAL records.

LSN: The Coordinate System

Every position in the WAL stream is identified by a Log Sequence Number (LSN) — a 64-bit integer representing the byte offset from the beginning of the WAL.

sql
-- Current write position (where the next WAL record will be written) SELECT pg_current_wal_lsn(); -- pg_current_wal_lsn -- -------------------- -- 2/4F3A1820 -- Current flush position (what has been flushed to disk) SELECT pg_current_wal_flush_lsn(); -- How much WAL has been generated since startup SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));

LSNs appear everywhere in Postgres: in tuple headers (t_lsn records the LSN of the last WAL record that modified the page), in replication slots, in recovery targets for PITR, and in monitoring views.

Anatomy of a WAL Record

Every WAL record has:

WALRecordHeader:
  xl_tot_len   uint32   — total length of the record
  xl_xid       TransactionId — transaction that generated this record
  xl_prev      XLogRecPtr   — LSN of the previous record (for backward traversal)
  xl_info      uint8        — flags and resource manager subtype
  xl_rmid      RmgrId       — resource manager ID (identifies record type)
  xl_crc       pg_crc32c    — CRC of the record

WALRecordData:
  [variable-length payload — format depends on resource manager]

The resource manager (xl_rmid) identifies what kind of operation this record describes. Key resource managers:

rmidNameWhat it logs
0XLOGCheckpoint records, backup labels
10HeapINSERT, UPDATE, DELETE, HOT update
11Heap2VACUUM, FREEZE, visibility map updates
1TransactionCOMMIT, ROLLBACK, PREPARE
2StorageFile creation/deletion
3CLOGTransaction status page updates
6BtreeB-tree splits, page deletions

What an INSERT Actually Writes to WAL

When you insert a row, the WAL record for the heap contains:

  1. The full tuple data (for non-HOT inserts) — enough to reconstruct the row on recovery
  2. The target page and offset — where the tuple was placed in the heap
  3. The transaction ID — for visibility tracking

For an UPDATE, WAL contains:

  1. A record marking the old tuple as dead (with its page/offset)
  2. A record with the new tuple data (full tuple for non-HOT, just the changed columns for HOT)

For a DELETE, WAL contains:

  1. A record marking the old tuple's xmax as set

This is why updates are expensive in Postgres — they generate more WAL than inserts. Every update is a delete + insert at the WAL level.

sql
-- Measure WAL generated by a single UPDATE SELECT pg_current_wal_lsn() AS before \gset UPDATE transactions SET status = 'confirmed' WHERE id = 1; SELECT pg_size_pretty( pg_wal_lsn_diff(pg_current_wal_lsn(), :'before') ) AS wal_generated; -- Typical result for a row update: 200–400 bytes of WAL

The Write Path: From Memory to Disk

Understanding exactly when data moves from memory to disk is critical for reasoning about both performance and durability.

Shared Buffers and the WAL Buffer

Postgres has two separate in-memory write buffers:

shared_buffers — the main page cache. When you modify a page (insert, update, delete), the modified page lives here until a background writer or checkpoint flushes it to the heap file on disk. A dirty page in shared_buffers is a performance optimization — it defers expensive random writes.

WAL buffers (wal_buffers, default 16MB or 1/32 of shared_buffers) — a circular buffer in shared memory where WAL records are accumulated before being written to the WAL segment files. WAL records flow from the WAL buffer to disk much more frequently than heap pages.

When WAL Is Flushed

WAL is flushed to disk (fsync'd) at these moments:

  1. Transaction commit — by default (synchronous_commit = on), WAL is flushed before the commit acknowledgement is sent to the client. This is what makes committed transactions durable.

  2. WAL buffer fills — if the circular WAL buffer fills up before a commit, the WAL writer flushes it to avoid stalling.

  3. The WAL writer background process — wakes up every wal_writer_delay milliseconds (default 200ms) and flushes any unflushed WAL. This bounds the exposure window for asynchronous commit.

  4. Checkpoint — all WAL through the checkpoint LSN is guaranteed to be on disk.

The WAL Writer Process

sql
-- Monitor WAL writer activity SELECT buffers_checkpoint, buffers_clean, buffers_backend, buffers_backend_fsync, buffers_alloc FROM pg_stat_bgwriter;

buffers_backend_fsync being non-zero means backends are having to do their own fsyncs because the WAL writer is falling behind. This is a performance warning sign.


Checkpoints: Bounding Recovery Time

WAL solves durability, but it creates a new problem: if the database crashes and needs to replay WAL for recovery, how far back does it need to go? In theory, the entire WAL history since the database was created.

Checkpoints solve this by periodically guaranteeing that all dirty heap pages have been flushed to disk. After a checkpoint completes, crash recovery only needs to replay WAL from the checkpoint LSN forward.

What Happens During a Checkpoint

  1. The checkpointer identifies all dirty pages in shared_buffers.
  2. Dirty pages are flushed to disk — this is the expensive part. The checkpointer spreads this work over time using checkpoint_completion_target to avoid a burst of I/O.
  3. The checkpoint record is written to WAL — recording the checkpoint LSN.
  4. The WAL is flushed — ensuring the checkpoint record is durable.

After a checkpoint:

  • All heap pages that were dirty before the checkpoint start are now on disk
  • Recovery can safely start from the checkpoint LSN instead of the beginning of WAL

Checkpoint Configuration

ini
# How often to checkpoint (time-based) checkpoint_timeout = 5min # default; increase for write-heavy workloads # How often to checkpoint (WAL-size-based) max_wal_size = 1GB # trigger checkpoint if WAL exceeds this # How to spread checkpoint I/O checkpoint_completion_target = 0.9 # use 90% of checkpoint_timeout interval for I/O # Minimum WAL to keep (even after checkpoint) min_wal_size = 80MB

The checkpoint_completion_target trade-off:

Setting this to 0.9 means the checkpointer spreads its I/O across 90% of the checkpoint_timeout interval. At checkpoint_timeout = 5min, that's 270 seconds of smooth I/O. This reduces I/O spikes but means at any point, up to 5 minutes of WAL must be replayed on crash.

Setting it to 0.1 means the checkpointer writes aggressively at the start — high I/O spike, but recovery is faster.

For most production systems: checkpoint_completion_target = 0.9 and checkpoint_timeout = 15min with max_wal_size = 4GB is a reasonable starting point.

Detecting Checkpoint Problems

sql
-- Checkpoints happening too frequently = WAL being generated faster than max_wal_size SELECT checkpoints_timed, checkpoints_req, -- req = triggered by WAL size, not time checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint FROM pg_stat_bgwriter;

If checkpoints_req is much higher than checkpoints_timed, your max_wal_size is too small for your write rate. Each requested checkpoint means the system generated 1GB (or whatever max_wal_size is) of WAL before the timeout elapsed. Increase max_wal_size.

sql
-- Also check the PostgreSQL log for: -- LOG: checkpoint starting: wal -- This means WAL-triggered, not time-triggered

synchronous_commit: Durability vs Latency

synchronous_commit controls when a COMMIT returns to the client. This is the most important durability knob in Postgres, and it is frequently misunderstood.

SettingBehaviorDurability Risk
on (default)WAL flushed to primary disk before COMMIT returnsNone
remote_writeWAL written (not flushed) to standby before COMMIT returnsCrash of standby before its flush loses data
remote_applyWAL applied on standby before COMMIT returnsNone — standby has applied data
localWAL flushed to primary only, regardless of standbyStandby can lag
offCOMMIT returns without waiting for WAL flushUp to wal_writer_delay (200ms) of committed data lost on crash

When to Use synchronous_commit = off

The risk is small and bounded: if the server crashes, transactions committed in the last wal_writer_delay milliseconds (200ms by default) may be lost. Postgres will not lie to you about this — the client receives COMMIT and the transaction may genuinely be lost.

This is acceptable for:

  • Session-level analytics queries that write intermediate results
  • Audit log inserts where losing a few records on crash is acceptable
  • High-throughput event ingestion where the business can tolerate small data loss

It is not acceptable for:

  • Financial transactions
  • Any data where "committed means durable" is a contract with the user
sql
-- Enable asynchronous commit for a single session SET synchronous_commit = off; -- Or for a specific transaction BEGIN; SET LOCAL synchronous_commit = off; INSERT INTO event_log ...; COMMIT; -- returns immediately, WAL flushed within 200ms

The latency benefit is significant: a synchronous commit that requires an fsync typically takes 1–5ms on SSDs. An asynchronous commit returns in microseconds. For high-frequency writes, this is a 100–1000x latency improvement.


WAL and Full Page Writes

There is a subtlety in WAL that causes significant write amplification on systems with page sizes different from the disk's atomic write unit.

When a page is first modified after a checkpoint, Postgres writes the entire 8KB page into the WAL record, not just the changed bytes. This is called a Full Page Write (FPW).

Why Full Page Writes Exist

Modern disks write in 512-byte or 4096-byte sectors. If Postgres's 8KB page is partially written when the system crashes (a "torn page"), the partially-written page is corrupted. The WAL record of just the changed bytes cannot reconstruct a valid page from a torn one.

Full page writes ensure that WAL contains a complete, valid copy of every page that was modified after the last checkpoint. If a torn page is found during recovery, the full page image from WAL overwrites it completely.

ini
# Full page writes are on by default — do not disable full_page_writes = on

The Write Amplification Implication

After every checkpoint, the first write to each page generates a WAL record that is 8KB + header overhead instead of just the changed bytes.

On a write-heavy workload with frequent checkpoints:

  • If checkpoint_timeout = 1min and you have 10,000 dirty pages
  • Every minute, those 10,000 pages get full page writes after the checkpoint
  • That's 80MB of WAL per minute from FPWs alone, regardless of actual change size

This is why aggressive checkpoint tuning increases WAL volume. A checkpoint every minute means every page gets a full page write every minute. A checkpoint every 15 minutes means each page gets a full page write every 15 minutes — 15x less FPW overhead.

sql
-- Measure FPW overhead SELECT wal_records, wal_fpi, -- full page images written wal_bytes, pg_size_pretty(wal_bytes) AS wal_size FROM pg_stat_wal; -- High wal_fpi relative to wal_records = lots of FPW overhead -- Usually caused by frequent checkpoints or large dirty working set

WAL Retention and pg_wal Sizing

After a checkpoint, segments that are no longer needed for recovery can be recycled. Postgres keeps enough WAL segments to cover the range between the oldest replication slot's confirmed flush LSN and the current LSN.

What Keeps WAL Around

Replication slots — if a standby falls behind and has a replication slot, Postgres retains all WAL since that slot's restart_lsn. A lagging or disconnected standby with a slot will cause pg_wal/ to grow without bound.

sql
-- Check replication slot lag SELECT slot_name, slot_type, active, pg_size_pretty( pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) ) AS retained_wal_size FROM pg_replication_slots;

If retained_wal_size is growing and the slot is not active, you have a disconnected standby holding WAL hostage. This will eventually fill your disk.

wal_keep_size — the minimum amount of WAL to keep, regardless of slots or replication need:

ini
# Keep at least 1GB of WAL even after it's no longer needed for recovery wal_keep_size = 1GB

This is useful when standbys connect without replication slots (they can catch up from retained WAL rather than needing a base backup).

Sizing pg_wal

Recommended pg_wal partition size for production:

pg_wal size = max_wal_size * 2  +  wal_keep_size  +  replication lag buffer

For a system with max_wal_size = 4GB and wal_keep_size = 2GB, allocate at least 12–16GB for pg_wal. Running out of space in pg_wal causes Postgres to PANIC and shut down — there is no graceful degradation.


Streaming Replication: WAL as the Replication Protocol

Postgres replication is built directly on WAL. There is no separate change data capture layer, no triggers, no logical change format at the physical replication level. The standby receives the WAL stream and replays it.

Physical Replication Architecture

Primary:
  Backend processes → WAL buffer → WAL writer → pg_wal segments
                                               ↓
                                         WAL sender process

Standby:
  WAL receiver process → pg_wal segments → startup process (recovery) → heap files

The WAL sender on the primary reads WAL segments and streams them to connected standbys. The WAL receiver on the standby writes received WAL to its local pg_wal/ and signals the startup process to apply it.

The startup process on the standby is running in recovery mode — it is perpetually replaying WAL, exactly as if it were recovering from a crash. The only difference is it never finishes: it keeps waiting for more WAL to arrive.

Replication Lag

Replication lag has multiple components, each measurable independently:

sql
-- On the primary: view all connected standbys SELECT application_name, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS total_lag, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn)) AS network_lag, pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS flush_lag, pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn)) AS apply_lag, write_lag, flush_lag, replay_lag FROM pg_stat_replication;
  • Network lag (sent - write): WAL sent but not yet written to standby's disk. Network bandwidth bottleneck.
  • Flush lag (write - flush): Written to standby disk but not fsynced. Standby I/O bottleneck.
  • Apply lag (flush - replay): Fsynced but not yet applied to standby's heap. CPU/disk bottleneck on replay.

Apply lag is the most dangerous for read-after-write workloads on standbys — a query routed to the standby may see stale data by exactly apply_lag worth of transactions.

Synchronous Replication

By default, replication is asynchronous. The primary does not wait for the standby before returning COMMIT. To require standby confirmation:

ini
# On primary — require at least one standby to confirm before commit synchronous_standby_names = 'standby1' # What "confirm" means (see synchronous_commit table above) synchronous_commit = remote_apply # standby has applied the transaction

The cost: every commit now has latency equal to the network round-trip + standby flush/apply time. For a standby 5ms away, commit latency increases by at least 5ms. For a cross-region standby, this can be 50–100ms per commit — catastrophic for transactional workloads.

The correct pattern for most production systems: asynchronous replication for normal traffic, synchronous commit on a per-transaction basis for critical writes:

sql
-- Normal transaction — async BEGIN; INSERT INTO events ...; COMMIT; -- Critical transaction — sync BEGIN; SET LOCAL synchronous_commit = remote_apply; UPDATE account_balances SET amount = amount - 100 WHERE id = 1; COMMIT; -- waits for standby confirmation

WAL-Level Settings and wal_level

wal_level controls how much information is written to WAL. It has three values that matter in practice:

ini
wal_level = minimal # Minimum for crash recovery. No replication possible. wal_level = replica # Default. Enables streaming replication. wal_level = logical # Enables logical replication and logical decoding.

logical mode writes more WAL because it includes enough information for logical decoding (reconstructing row-level changes for CDC consumers like Debezium). The overhead is typically 10–30% more WAL on write-heavy workloads.

Logical Replication vs Physical Replication

PhysicalLogical
What is replicatedRaw WAL (page changes)Row-level changes
Schema compatibilityStandby must be identical schemaCan replicate to different schema/version
Replication granularityEntire clusterIndividual tables
OverheadLowerHigher (WAL decoding)
Use caseHA standbyCDC, cross-version upgrades, partial replication

Logical replication uses replication slots to track consumer position. The slot ensures WAL is retained until the consumer confirms it has processed everything. A stalled logical replication consumer is a pg_wal filling time bomb.


WAL and Write Amplification: The Full Picture

Every write in Postgres touches multiple locations. Understanding the full write amplification helps you reason about storage I/O and WAL volume.

For a single UPDATE to one row:

1. WAL buffer (memory):
   - Full page write record (~8192 bytes) if first modification post-checkpoint
   - Update record (~200-400 bytes)

2. WAL file (disk):
   - Same data, flushed on commit

3. shared_buffers (memory):
   - Old tuple marked dead (xmax set)
   - New tuple written to available space

4. Heap file (disk, deferred):
   - Eventually flushed by background writer or checkpoint

5. If indexes exist:
   - Index page modified in shared_buffers
   - Full page write record for index page (if first modification post-checkpoint)
   - Eventually flushed to index file on disk

6. Visibility Map (if applicable):
   - VM bit cleared (page is no longer all-visible)
   - WAL record for VM update

7. pg_clog / pg_xact:
   - Transaction status updated on commit

A single row update can generate 3–5 WAL records and touch 4–8 distinct on-disk locations. On a table with 5 indexes, an update touches even more. This is the true cost of an UPDATE — not the query execution time, but the write amplification cascade it triggers.

Measuring Actual WAL Generation Per Query

sql
-- Use EXPLAIN (WAL) to see WAL generated by a query EXPLAIN (ANALYZE, WAL, BUFFERS) UPDATE transactions SET status = 'confirmed' WHERE block_height = 18500050; -- Output includes: -- WAL: records=3 fpi=2 bytes=18432 -- records = number of WAL records written -- fpi = full page images (8KB each) -- bytes = total WAL bytes generated

fpi=2 in this output means two full page images were written — 16KB of WAL just for page images, plus the actual change data. If this query runs 1,000 times per second, that's 16MB/s of WAL from FPWs alone for this one query pattern.


Diagnosing WAL Problems in Production

Symptom: pg_wal Growing Without Bound

sql
-- Step 1: Check replication slots SELECT slot_name, active, pg_size_pretty( pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) ) AS retained FROM pg_replication_slots; -- Step 2: Check standby lag SELECT application_name, replay_lag FROM pg_stat_replication; -- Step 3: If a slot is inactive and you can afford to drop it: SELECT pg_drop_replication_slot('stalled_slot_name');

Symptom: Frequent Checkpoint Warnings in Logs

LOG: checkpoints are occurring too frequently (9 seconds apart)
HINT: Consider increasing max_wal_size.
sql
-- Check WAL generation rate SELECT pg_size_pretty(wal_bytes) AS total_wal, wal_records, wal_fpi, stats_reset FROM pg_stat_wal; -- Calculate WAL rate since stats reset SELECT pg_size_pretty( wal_bytes / EXTRACT(EPOCH FROM (now() - stats_reset)) ) AS wal_bytes_per_second FROM pg_stat_wal;

If WAL generation is 100MB/s and max_wal_size = 1GB, checkpoints will trigger every 10 seconds. Either increase max_wal_size or reduce your write rate (fewer updates, batch writes, use HOT updates).

Symptom: High Replication Lag Spike During Checkpoint

During a checkpoint, the I/O load on the primary increases as dirty pages are flushed. This I/O contention can delay WAL sender from keeping up with new WAL, causing lag spikes on standbys.

Fix: tune checkpoint_completion_target = 0.9 to spread I/O, and use effective_io_concurrency to allow I/O prefetching during recovery on the standby.

Symptom: Standby Apply Lag Grows Under Write Load

Apply lag growing means the standby's recovery process can't apply WAL as fast as it arrives. This is CPU-bound on the standby (single-threaded WAL apply).

Postgres 14+ introduced parallel WAL apply on standbys:

ini
# On standby recovery_min_apply_delay = 0 max_parallel_apply_workers_per_subscription = 4 # for logical replication

For physical replication, parallel apply is not available until Postgres 16+ in some configurations. The fix is usually to ensure the standby has faster single-core performance or reduce write load on the primary.


Production Tuning Reference

ini
# WAL generation wal_level = replica # or logical if CDC is needed full_page_writes = on # never disable # WAL buffering wal_buffers = 64MB # increase from default for write-heavy workloads wal_writer_delay = 200ms # default is fine # Checkpoint behavior checkpoint_timeout = 15min # reduce checkpoint frequency max_wal_size = 4GB # for high-write systems, go higher min_wal_size = 1GB checkpoint_completion_target = 0.9 # Durability synchronous_commit = on # never disable globally # Use SET LOCAL synchronous_commit = off for specific high-freq writes # Replication retention wal_keep_size = 2GB # safety buffer for standbys without slots # Archive (if using PITR) archive_mode = on archive_command = 'cp %p /mnt/wal_archive/%f' archive_timeout = 60 # force segment switch after 60s of inactivity

The Production Incident: WAL Archiving Stall Causing 6-Hour Replica Lag

Context: A blockchain indexer processing ~2,000 TPS with a primary and two physical standbys.

What happened:

An archive_command was configured to copy WAL segments to an NFS mount. During a network partition, the NFS mount became unresponsive. The archive_command hung — it did not fail, it just never returned.

Postgres was waiting for the archive command to complete before allowing WAL segment recycling. But with the archive command hanging, no segments could be recycled. pg_wal/ grew until it hit the disk limit, at which point Postgres paused WAL writing.

Paused WAL writing means no commits can complete. The primary appeared to hang. All application writes stalled.

The standbys: they had received WAL up to the pause point and were applying it, but since the primary was paused, their apply caught up quickly — and then they were waiting for new WAL. Replication appeared healthy. The lag was invisible until the primary recovered.

After the partition healed: The archive command eventually completed or timed out. Postgres resumed WAL writing. But the WAL sender had to catch up standbys on all the writes that piled up during the stall — leading to 6 hours of visible replication lag as the standbys processed the backlog.

The fixes:

bash
# 1. Set a timeout on the archive command archive_command = 'timeout 30 cp %p /mnt/wal_archive/%f' # 2. Monitor archive status
sql
-- Check for archive failures SELECT archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset FROM pg_stat_archiver;
ini
# 3. Set a maximum archive wait time # If archive fails, Postgres retries — add alerting on last_failed_time

The lesson: WAL archiving stalls are silent until they're catastrophic. pg_stat_archiver.failed_count incrementing without alerting is a ticking clock toward a primary outage.


Summary

ConceptKey Takeaway
WAL purposeSequential writes for durability; heap pages written lazily in background
Commit durabilityWAL flushed to disk before COMMIT returns (by default). Heap page can wait.
Full page writesFirst write to each page post-checkpoint writes full 8KB to WAL — major write amplification driver
CheckpointsBound recovery time by flushing all dirty pages; too-frequent checkpoints = excessive FPW overhead
synchronous_commitoff trades up to 200ms of durability for commit latency. Acceptable for some workloads.
Write amplificationA single UPDATE touches WAL buffer, WAL file, heap buffer, heap file, index buffer, index file, VM
Streaming replicationStandby receives WAL stream and replays it in perpetual recovery mode
Replication lagThree components: network lag, flush lag, apply lag — measure each independently
Replication slotsRetain WAL for consumers; stalled slots grow pg_wal without bound
WAL archivingarchive_command must have a timeout; stalls silently cause disk exhaustion

WAL is the foundation everything else in Postgres is built on — replication, PITR, crash recovery, and even some of the MVCC mechanics you saw in Module 2. Module 4 goes into the process that runs on top of MVCC and WAL to keep your database healthy: Autovacuum.

Next: Module 4 — Autovacuum: The Process Everyone Misconfigures →

Discussion