Module 3·44 min read

Every byte you write passes through WAL. Understanding WAL is understanding your write amplification.

Module 3 — Write-Ahead Logging: Durability, Replication, and the Price of Every Write

What this module covers: The WAL mechanism from first principles — why it exists, what is physically written, how checkpoints and background writes interact, how streaming replication flows from WAL, and the production consequences of misconfiguring any of it. By the end, you will be able to calculate write amplification for any workload and diagnose WAL-related performance problems from pg_stat_* views alone.

Why WAL Exists: The Durability Problem

Consider what happens when you commit a transaction in a naive database implementation.

The data must eventually reach disk. Disk writes are expensive — an 8KB page write on a spinning disk takes 5–10ms. A modern OLTP system might commit thousands of transactions per second. If every commit required flushing all modified heap pages to disk synchronously, throughput would collapse.

But you cannot skip the flush. If the system crashes before modified pages reach disk, those committed transactions are lost. That violates Durability — the D in ACID.

The naive options are both unacceptable:

Flush heap pages on every commit → terrible write throughput
Don't flush → committed data lost on crash

WAL is the solution to this dilemma.

Instead of flushing heap pages on commit, Postgres flushes a compact sequential record of what changed. This record — the Write-Ahead Log — is small, sequential, and fast to write. Heap pages can be flushed lazily in the background.

On crash recovery, Postgres replays the WAL to reconstruct any committed changes that hadn't yet made it to heap pages. The WAL is the authoritative record of what happened. The heap is a derived, materialized form of that record.

This is the core invariant of WAL: a transaction's WAL record must be on disk before the commit returns to the client. The heap page can wait. The WAL cannot.

Physical Structure of the WAL

Segments, Pages, and Records

WAL is stored in $PGDATA/pg_wal/ as a series of segment files, each 16MB by default (configurable at initdb time via --wal-segsize).

$PGDATA/pg_wal/
├── 000000010000000000000001
├── 000000010000000000000002
├── 000000010000000000000003
└── ...

The filename encodes three components in hexadecimal:

Timeline ID (00000001) — identifies the database history branch (changes after PITR recovery)
Segment high bits (00000000) — upper 32 bits of the segment number
Segment low bits (00000001) — lower 32 bits of the segment number

Each 16MB segment is divided into 8KB pages (matching the heap page size). Each page has a header. Within pages, WAL is written as a stream of variable-length WAL records.

LSN: The Coordinate System

Every position in the WAL stream is identified by a Log Sequence Number (LSN) — a 64-bit integer representing the byte offset from the beginning of the WAL.

sql
-- Current write position (where the next WAL record will be written)
SELECT pg_current_wal_lsn();
--  pg_current_wal_lsn
-- --------------------
--  2/4F3A1820

-- Current flush position (what has been flushed to disk)
SELECT pg_current_wal_flush_lsn();

-- How much WAL has been generated since startup
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));

LSNs appear everywhere in Postgres: in tuple headers (t_lsn records the LSN of the last WAL record that modified the page), in replication slots, in recovery targets for PITR, and in monitoring views.

Anatomy of a WAL Record

Every WAL record has:

WALRecordHeader:
  xl_tot_len   uint32   — total length of the record
  xl_xid       TransactionId — transaction that generated this record
  xl_prev      XLogRecPtr   — LSN of the previous record (for backward traversal)
  xl_info      uint8        — flags and resource manager subtype
  xl_rmid      RmgrId       — resource manager ID (identifies record type)
  xl_crc       pg_crc32c    — CRC of the record

WALRecordData:
  [variable-length payload — format depends on resource manager]

The resource manager (xl_rmid) identifies what kind of operation this record describes. Key resource managers:

rmid	Name	What it logs
0	XLOG	Checkpoint records, backup labels
10	Heap	INSERT, UPDATE, DELETE, HOT update
11	Heap2	VACUUM, FREEZE, visibility map updates
1	Transaction	COMMIT, ROLLBACK, PREPARE
2	Storage	File creation/deletion
3	CLOG	Transaction status page updates
6	Btree	B-tree splits, page deletions

What an INSERT Actually Writes to WAL

When you insert a row, the WAL record for the heap contains:

The full tuple data (for non-HOT inserts) — enough to reconstruct the row on recovery
The target page and offset — where the tuple was placed in the heap
The transaction ID — for visibility tracking

For an UPDATE, WAL contains:

A record marking the old tuple as dead (with its page/offset)
A record with the new tuple data (full tuple for non-HOT, just the changed columns for HOT)

For a DELETE, WAL contains:

A record marking the old tuple's xmax as set

This is why updates are expensive in Postgres — they generate more WAL than inserts. Every update is a delete + insert at the WAL level.

sql
-- Measure WAL generated by a single UPDATE
SELECT pg_current_wal_lsn() AS before \gset
UPDATE transactions SET status = 'confirmed' WHERE id = 1;
SELECT pg_size_pretty(
  pg_wal_lsn_diff(pg_current_wal_lsn(), :'before')
) AS wal_generated;

-- Typical result for a row update: 200–400 bytes of WAL

The Write Path: From Memory to Disk

Understanding exactly when data moves from memory to disk is critical for reasoning about both performance and durability.

Shared Buffers and the WAL Buffer

Postgres has two separate in-memory write buffers:

shared_buffers — the main page cache. When you modify a page (insert, update, delete), the modified page lives here until a background writer or checkpoint flushes it to the heap file on disk. A dirty page in shared_buffers is a performance optimization — it defers expensive random writes.

WAL buffers (wal_buffers, default 16MB or 1/32 of shared_buffers) — a circular buffer in shared memory where WAL records are accumulated before being written to the WAL segment files. WAL records flow from the WAL buffer to disk much more frequently than heap pages.

When WAL Is Flushed

WAL is flushed to disk (fsync'd) at these moments:

Transaction commit — by default (synchronous_commit = on), WAL is flushed before the commit acknowledgement is sent to the client. This is what makes committed transactions durable.
WAL buffer fills — if the circular WAL buffer fills up before a commit, the WAL writer flushes it to avoid stalling.
The WAL writer background process — wakes up every wal_writer_delay milliseconds (default 200ms) and flushes any unflushed WAL. This bounds the exposure window for asynchronous commit.
Checkpoint — all WAL through the checkpoint LSN is guaranteed to be on disk.

The WAL Writer Process

sql
-- Monitor WAL writer activity
SELECT
  buffers_checkpoint,
  buffers_clean,
  buffers_backend,
  buffers_backend_fsync,
  buffers_alloc
FROM pg_stat_bgwriter;

buffers_backend_fsync being non-zero means backends are having to do their own fsyncs because the WAL writer is falling behind. This is a performance warning sign.

Checkpoints: Bounding Recovery Time

WAL solves durability, but it creates a new problem: if the database crashes and needs to replay WAL for recovery, how far back does it need to go? In theory, the entire WAL history since the database was created.

Checkpoints solve this by periodically guaranteeing that all dirty heap pages have been flushed to disk. After a checkpoint completes, crash recovery only needs to replay WAL from the checkpoint LSN forward.

What Happens During a Checkpoint

The checkpointer identifies all dirty pages in shared_buffers.
Dirty pages are flushed to disk — this is the expensive part. The checkpointer spreads this work over time using checkpoint_completion_target to avoid a burst of I/O.
The checkpoint record is written to WAL — recording the checkpoint LSN.
The WAL is flushed — ensuring the checkpoint record is durable.

After a checkpoint:

All heap pages that were dirty before the checkpoint start are now on disk
Recovery can safely start from the checkpoint LSN instead of the beginning of WAL

Checkpoint Configuration

ini
# How often to checkpoint (time-based)
checkpoint_timeout = 5min          # default; increase for write-heavy workloads

# How often to checkpoint (WAL-size-based)
max_wal_size = 1GB                 # trigger checkpoint if WAL exceeds this

# How to spread checkpoint I/O
checkpoint_completion_target = 0.9 # use 90% of checkpoint_timeout interval for I/O

# Minimum WAL to keep (even after checkpoint)
min_wal_size = 80MB

The checkpoint_completion_target trade-off:

Setting this to 0.9 means the checkpointer spreads its I/O across 90% of the checkpoint_timeout interval. At checkpoint_timeout = 5min, that's 270 seconds of smooth I/O. This reduces I/O spikes but means at any point, up to 5 minutes of WAL must be replayed on crash.

Setting it to 0.1 means the checkpointer writes aggressively at the start — high I/O spike, but recovery is faster.

For most production systems: checkpoint_completion_target = 0.9 and checkpoint_timeout = 15min with max_wal_size = 4GB is a reasonable starting point.

Detecting Checkpoint Problems

sql
-- Checkpoints happening too frequently = WAL being generated faster than max_wal_size
SELECT
  checkpoints_timed,
  checkpoints_req,          -- req = triggered by WAL size, not time
  checkpoint_write_time,
  checkpoint_sync_time,
  buffers_checkpoint
FROM pg_stat_bgwriter;

If checkpoints_req is much higher than checkpoints_timed, your max_wal_size is too small for your write rate. Each requested checkpoint means the system generated 1GB (or whatever max_wal_size is) of WAL before the timeout elapsed. Increase max_wal_size.

sql
-- Also check the PostgreSQL log for:
-- LOG: checkpoint starting: wal
-- This means WAL-triggered, not time-triggered

`synchronous_commit`: Durability vs Latency

synchronous_commit controls when a COMMIT returns to the client. This is the most important durability knob in Postgres, and it is frequently misunderstood.

Setting	Behavior	Durability Risk
`on` (default)	WAL flushed to primary disk before COMMIT returns	None
`remote_write`	WAL written (not flushed) to standby before COMMIT returns	Crash of standby before its flush loses data
`remote_apply`	WAL applied on standby before COMMIT returns	None — standby has applied data
`local`	WAL flushed to primary only, regardless of standby	Standby can lag
`off`	COMMIT returns without waiting for WAL flush	Up to `wal_writer_delay` (200ms) of committed data lost on crash

When to Use `synchronous_commit = off`

The risk is small and bounded: if the server crashes, transactions committed in the last wal_writer_delay milliseconds (200ms by default) may be lost. Postgres will not lie to you about this — the client receives COMMIT and the transaction may genuinely be lost.

This is acceptable for:

Session-level analytics queries that write intermediate results
Audit log inserts where losing a few records on crash is acceptable
High-throughput event ingestion where the business can tolerate small data loss

It is not acceptable for:

Financial transactions
Any data where "committed means durable" is a contract with the user

sql
-- Enable asynchronous commit for a single session
SET synchronous_commit = off;

-- Or for a specific transaction
BEGIN;
SET LOCAL synchronous_commit = off;
INSERT INTO event_log ...;
COMMIT; -- returns immediately, WAL flushed within 200ms

The latency benefit is significant: a synchronous commit that requires an fsync typically takes 1–5ms on SSDs. An asynchronous commit returns in microseconds. For high-frequency writes, this is a 100–1000x latency improvement.

WAL and Full Page Writes

There is a subtlety in WAL that causes significant write amplification on systems with page sizes different from the disk's atomic write unit.

When a page is first modified after a checkpoint, Postgres writes the entire 8KB page into the WAL record, not just the changed bytes. This is called a Full Page Write (FPW).

Why Full Page Writes Exist

Modern disks write in 512-byte or 4096-byte sectors. If Postgres's 8KB page is partially written when the system crashes (a "torn page"), the partially-written page is corrupted. The WAL record of just the changed bytes cannot reconstruct a valid page from a torn one.

Full page writes ensure that WAL contains a complete, valid copy of every page that was modified after the last checkpoint. If a torn page is found during recovery, the full page image from WAL overwrites it completely.

ini
# Full page writes are on by default — do not disable
full_page_writes = on

The Write Amplification Implication

After every checkpoint, the first write to each page generates a WAL record that is 8KB + header overhead instead of just the changed bytes.

On a write-heavy workload with frequent checkpoints:

If checkpoint_timeout = 1min and you have 10,000 dirty pages
Every minute, those 10,000 pages get full page writes after the checkpoint
That's 80MB of WAL per minute from FPWs alone, regardless of actual change size

This is why aggressive checkpoint tuning increases WAL volume. A checkpoint every minute means every page gets a full page write every minute. A checkpoint every 15 minutes means each page gets a full page write every 15 minutes — 15x less FPW overhead.

sql
-- Measure FPW overhead
SELECT
  wal_records,
  wal_fpi,                          -- full page images written
  wal_bytes,
  pg_size_pretty(wal_bytes) AS wal_size
FROM pg_stat_wal;

-- High wal_fpi relative to wal_records = lots of FPW overhead
-- Usually caused by frequent checkpoints or large dirty working set

WAL Retention and `pg_wal` Sizing

After a checkpoint, segments that are no longer needed for recovery can be recycled. Postgres keeps enough WAL segments to cover the range between the oldest replication slot's confirmed flush LSN and the current LSN.

What Keeps WAL Around

Replication slots — if a standby falls behind and has a replication slot, Postgres retains all WAL since that slot's restart_lsn. A lagging or disconnected standby with a slot will cause pg_wal/ to grow without bound.

sql
-- Check replication slot lag
SELECT
  slot_name,
  slot_type,
  active,
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
  ) AS retained_wal_size
FROM pg_replication_slots;

If retained_wal_size is growing and the slot is not active, you have a disconnected standby holding WAL hostage. This will eventually fill your disk.

wal_keep_size — the minimum amount of WAL to keep, regardless of slots or replication need:

ini
# Keep at least 1GB of WAL even after it's no longer needed for recovery
wal_keep_size = 1GB

This is useful when standbys connect without replication slots (they can catch up from retained WAL rather than needing a base backup).

Sizing `pg_wal`

Recommended pg_wal partition size for production:

pg_wal size = max_wal_size * 2  +  wal_keep_size  +  replication lag buffer

For a system with max_wal_size = 4GB and wal_keep_size = 2GB, allocate at least 12–16GB for pg_wal. Running out of space in pg_wal causes Postgres to PANIC and shut down — there is no graceful degradation.

Streaming Replication: WAL as the Replication Protocol

Postgres replication is built directly on WAL. There is no separate change data capture layer, no triggers, no logical change format at the physical replication level. The standby receives the WAL stream and replays it.

Physical Replication Architecture

Primary:
  Backend processes → WAL buffer → WAL writer → pg_wal segments
                                               ↓
                                         WAL sender process

Standby:
  WAL receiver process → pg_wal segments → startup process (recovery) → heap files

The WAL sender on the primary reads WAL segments and streams them to connected standbys. The WAL receiver on the standby writes received WAL to its local pg_wal/ and signals the startup process to apply it.

The startup process on the standby is running in recovery mode — it is perpetually replaying WAL, exactly as if it were recovering from a crash. The only difference is it never finishes: it keeps waiting for more WAL to arrive.

Replication Lag

Replication lag has multiple components, each measurable independently:

sql
-- On the primary: view all connected standbys
SELECT
  application_name,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS total_lag,
  pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn))  AS network_lag,
  pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS flush_lag,
  pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn)) AS apply_lag,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

Network lag (sent - write): WAL sent but not yet written to standby's disk. Network bandwidth bottleneck.
Flush lag (write - flush): Written to standby disk but not fsynced. Standby I/O bottleneck.
Apply lag (flush - replay): Fsynced but not yet applied to standby's heap. CPU/disk bottleneck on replay.

Apply lag is the most dangerous for read-after-write workloads on standbys — a query routed to the standby may see stale data by exactly apply_lag worth of transactions.

Synchronous Replication

By default, replication is asynchronous. The primary does not wait for the standby before returning COMMIT. To require standby confirmation:

ini
# On primary — require at least one standby to confirm before commit
synchronous_standby_names = 'standby1'

# What "confirm" means (see synchronous_commit table above)
synchronous_commit = remote_apply  # standby has applied the transaction

The cost: every commit now has latency equal to the network round-trip + standby flush/apply time. For a standby 5ms away, commit latency increases by at least 5ms. For a cross-region standby, this can be 50–100ms per commit — catastrophic for transactional workloads.

The correct pattern for most production systems: asynchronous replication for normal traffic, synchronous commit on a per-transaction basis for critical writes:

sql
-- Normal transaction — async
BEGIN;
INSERT INTO events ...;
COMMIT;

-- Critical transaction — sync
BEGIN;
SET LOCAL synchronous_commit = remote_apply;
UPDATE account_balances SET amount = amount - 100 WHERE id = 1;
COMMIT; -- waits for standby confirmation

WAL-Level Settings and `wal_level`

wal_level controls how much information is written to WAL. It has three values that matter in practice:

ini
wal_level = minimal    # Minimum for crash recovery. No replication possible.
wal_level = replica    # Default. Enables streaming replication.
wal_level = logical    # Enables logical replication and logical decoding.

logical mode writes more WAL because it includes enough information for logical decoding (reconstructing row-level changes for CDC consumers like Debezium). The overhead is typically 10–30% more WAL on write-heavy workloads.

Logical Replication vs Physical Replication

	Physical	Logical
What is replicated	Raw WAL (page changes)	Row-level changes
Schema compatibility	Standby must be identical schema	Can replicate to different schema/version
Replication granularity	Entire cluster	Individual tables
Overhead	Lower	Higher (WAL decoding)
Use case	HA standby	CDC, cross-version upgrades, partial replication

Logical replication uses replication slots to track consumer position. The slot ensures WAL is retained until the consumer confirms it has processed everything. A stalled logical replication consumer is a pg_wal filling time bomb.

WAL and Write Amplification: The Full Picture

Every write in Postgres touches multiple locations. Understanding the full write amplification helps you reason about storage I/O and WAL volume.

For a single UPDATE to one row:

1. WAL buffer (memory):
   - Full page write record (~8192 bytes) if first modification post-checkpoint
   - Update record (~200-400 bytes)

2. WAL file (disk):
   - Same data, flushed on commit

3. shared_buffers (memory):
   - Old tuple marked dead (xmax set)
   - New tuple written to available space

4. Heap file (disk, deferred):
   - Eventually flushed by background writer or checkpoint

5. If indexes exist:
   - Index page modified in shared_buffers
   - Full page write record for index page (if first modification post-checkpoint)
   - Eventually flushed to index file on disk

6. Visibility Map (if applicable):
   - VM bit cleared (page is no longer all-visible)
   - WAL record for VM update

7. pg_clog / pg_xact:
   - Transaction status updated on commit

A single row update can generate 3–5 WAL records and touch 4–8 distinct on-disk locations. On a table with 5 indexes, an update touches even more. This is the true cost of an UPDATE — not the query execution time, but the write amplification cascade it triggers.

Measuring Actual WAL Generation Per Query

sql
-- Use EXPLAIN (WAL) to see WAL generated by a query
EXPLAIN (ANALYZE, WAL, BUFFERS)
UPDATE transactions
SET status = 'confirmed'
WHERE block_height = 18500050;

-- Output includes:
-- WAL: records=3 fpi=2 bytes=18432
-- records = number of WAL records written
-- fpi = full page images (8KB each)
-- bytes = total WAL bytes generated

fpi=2 in this output means two full page images were written — 16KB of WAL just for page images, plus the actual change data. If this query runs 1,000 times per second, that's 16MB/s of WAL from FPWs alone for this one query pattern.

Diagnosing WAL Problems in Production

Symptom: `pg_wal` Growing Without Bound

sql
-- Step 1: Check replication slots
SELECT slot_name, active, pg_size_pretty(
  pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS retained FROM pg_replication_slots;

-- Step 2: Check standby lag
SELECT application_name, replay_lag FROM pg_stat_replication;

-- Step 3: If a slot is inactive and you can afford to drop it:
SELECT pg_drop_replication_slot('stalled_slot_name');

Symptom: Frequent Checkpoint Warnings in Logs

LOG: checkpoints are occurring too frequently (9 seconds apart)
HINT: Consider increasing max_wal_size.

sql
-- Check WAL generation rate
SELECT
  pg_size_pretty(wal_bytes) AS total_wal,
  wal_records,
  wal_fpi,
  stats_reset
FROM pg_stat_wal;

-- Calculate WAL rate since stats reset
SELECT
  pg_size_pretty(
    wal_bytes / EXTRACT(EPOCH FROM (now() - stats_reset))
  ) AS wal_bytes_per_second
FROM pg_stat_wal;

If WAL generation is 100MB/s and max_wal_size = 1GB, checkpoints will trigger every 10 seconds. Either increase max_wal_size or reduce your write rate (fewer updates, batch writes, use HOT updates).

Symptom: High Replication Lag Spike During Checkpoint

During a checkpoint, the I/O load on the primary increases as dirty pages are flushed. This I/O contention can delay WAL sender from keeping up with new WAL, causing lag spikes on standbys.

Fix: tune checkpoint_completion_target = 0.9 to spread I/O, and use effective_io_concurrency to allow I/O prefetching during recovery on the standby.

Symptom: Standby Apply Lag Grows Under Write Load

Apply lag growing means the standby's recovery process can't apply WAL as fast as it arrives. This is CPU-bound on the standby (single-threaded WAL apply).

Postgres 14+ introduced parallel WAL apply on standbys:

ini
# On standby
recovery_min_apply_delay = 0
max_parallel_apply_workers_per_subscription = 4  # for logical replication

For physical replication, parallel apply is not available until Postgres 16+ in some configurations. The fix is usually to ensure the standby has faster single-core performance or reduce write load on the primary.

Production Tuning Reference

ini
# WAL generation
wal_level = replica                    # or logical if CDC is needed
full_page_writes = on                  # never disable

# WAL buffering
wal_buffers = 64MB                     # increase from default for write-heavy workloads
wal_writer_delay = 200ms               # default is fine

# Checkpoint behavior
checkpoint_timeout = 15min             # reduce checkpoint frequency
max_wal_size = 4GB                     # for high-write systems, go higher
min_wal_size = 1GB
checkpoint_completion_target = 0.9

# Durability
synchronous_commit = on                # never disable globally
# Use SET LOCAL synchronous_commit = off for specific high-freq writes

# Replication retention
wal_keep_size = 2GB                    # safety buffer for standbys without slots

# Archive (if using PITR)
archive_mode = on
archive_command = 'cp %p /mnt/wal_archive/%f'
archive_timeout = 60                   # force segment switch after 60s of inactivity

The Production Incident: WAL Archiving Stall Causing 6-Hour Replica Lag

Context: A blockchain indexer processing ~2,000 TPS with a primary and two physical standbys.

What happened:

An archive_command was configured to copy WAL segments to an NFS mount. During a network partition, the NFS mount became unresponsive. The archive_command hung — it did not fail, it just never returned.

Postgres was waiting for the archive command to complete before allowing WAL segment recycling. But with the archive command hanging, no segments could be recycled. pg_wal/ grew until it hit the disk limit, at which point Postgres paused WAL writing.

Paused WAL writing means no commits can complete. The primary appeared to hang. All application writes stalled.

The standbys: they had received WAL up to the pause point and were applying it, but since the primary was paused, their apply caught up quickly — and then they were waiting for new WAL. Replication appeared healthy. The lag was invisible until the primary recovered.

After the partition healed: The archive command eventually completed or timed out. Postgres resumed WAL writing. But the WAL sender had to catch up standbys on all the writes that piled up during the stall — leading to 6 hours of visible replication lag as the standbys processed the backlog.

The fixes:

bash
# 1. Set a timeout on the archive command
archive_command = 'timeout 30 cp %p /mnt/wal_archive/%f'

# 2. Monitor archive status

sql
-- Check for archive failures
SELECT
  archived_count,
  last_archived_wal,
  last_archived_time,
  failed_count,
  last_failed_wal,
  last_failed_time,
  stats_reset
FROM pg_stat_archiver;

ini
# 3. Set a maximum archive wait time
# If archive fails, Postgres retries — add alerting on last_failed_time

The lesson: WAL archiving stalls are silent until they're catastrophic. pg_stat_archiver.failed_count incrementing without alerting is a ticking clock toward a primary outage.

Summary

Concept	Key Takeaway
WAL purpose	Sequential writes for durability; heap pages written lazily in background
Commit durability	WAL flushed to disk before COMMIT returns (by default). Heap page can wait.
Full page writes	First write to each page post-checkpoint writes full 8KB to WAL — major write amplification driver
Checkpoints	Bound recovery time by flushing all dirty pages; too-frequent checkpoints = excessive FPW overhead
`synchronous_commit`	`off` trades up to 200ms of durability for commit latency. Acceptable for some workloads.
Write amplification	A single UPDATE touches WAL buffer, WAL file, heap buffer, heap file, index buffer, index file, VM
Streaming replication	Standby receives WAL stream and replays it in perpetual recovery mode
Replication lag	Three components: network lag, flush lag, apply lag — measure each independently
Replication slots	Retain WAL for consumers; stalled slots grow `pg_wal` without bound
WAL archiving	`archive_command` must have a timeout; stalls silently cause disk exhaustion

WAL is the foundation everything else in Postgres is built on — replication, PITR, crash recovery, and even some of the MVCC mechanics you saw in Module 2. Module 4 goes into the process that runs on top of MVCC and WAL to keep your database healthy: Autovacuum.

Next: Module 4 — Autovacuum: The Process Everyone Misconfigures →

PreviousModule 2: MVCC: The Architecture That Makes Concurrency Possible (and Expensive)Next Module 4: Autovacuum: The Process Everyone Misconfigures