Every byte you write passes through WAL. Understanding WAL is understanding your write amplification.
Module 3 — Write-Ahead Logging: Durability, Replication, and the Price of Every Write
What this module covers: The WAL mechanism from first principles — why it exists, what is physically written, how checkpoints and background writes interact, how streaming replication flows from WAL, and the production consequences of misconfiguring any of it. By the end, you will be able to calculate write amplification for any workload and diagnose WAL-related performance problems from pg_stat_* views alone.
Why WAL Exists: The Durability Problem
Consider what happens when you commit a transaction in a naive database implementation.
The data must eventually reach disk. Disk writes are expensive — an 8KB page write on a spinning disk takes 5–10ms. A modern OLTP system might commit thousands of transactions per second. If every commit required flushing all modified heap pages to disk synchronously, throughput would collapse.
But you cannot skip the flush. If the system crashes before modified pages reach disk, those committed transactions are lost. That violates Durability — the D in ACID.
The naive options are both unacceptable:
- Flush heap pages on every commit → terrible write throughput
- Don't flush → committed data lost on crash
WAL is the solution to this dilemma.
Instead of flushing heap pages on commit, Postgres flushes a compact sequential record of what changed. This record — the Write-Ahead Log — is small, sequential, and fast to write. Heap pages can be flushed lazily in the background.
On crash recovery, Postgres replays the WAL to reconstruct any committed changes that hadn't yet made it to heap pages. The WAL is the authoritative record of what happened. The heap is a derived, materialized form of that record.
This is the core invariant of WAL: a transaction's WAL record must be on disk before the commit returns to the client. The heap page can wait. The WAL cannot.
Physical Structure of the WAL
Segments, Pages, and Records
WAL is stored in $PGDATA/pg_wal/ as a series of segment files, each 16MB by default (configurable at initdb time via --wal-segsize).
$PGDATA/pg_wal/
├── 000000010000000000000001
├── 000000010000000000000002
├── 000000010000000000000003
└── ...
The filename encodes three components in hexadecimal:
- Timeline ID (
00000001) — identifies the database history branch (changes after PITR recovery) - Segment high bits (
00000000) — upper 32 bits of the segment number - Segment low bits (
00000001) — lower 32 bits of the segment number
Each 16MB segment is divided into 8KB pages (matching the heap page size). Each page has a header. Within pages, WAL is written as a stream of variable-length WAL records.
LSN: The Coordinate System
Every position in the WAL stream is identified by a Log Sequence Number (LSN) — a 64-bit integer representing the byte offset from the beginning of the WAL.
sql-- Current write position (where the next WAL record will be written) SELECT pg_current_wal_lsn(); -- pg_current_wal_lsn -- -------------------- -- 2/4F3A1820 -- Current flush position (what has been flushed to disk) SELECT pg_current_wal_flush_lsn(); -- How much WAL has been generated since startup SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));
LSNs appear everywhere in Postgres: in tuple headers (t_lsn records the LSN of the last WAL record that modified the page), in replication slots, in recovery targets for PITR, and in monitoring views.
Anatomy of a WAL Record
Every WAL record has:
WALRecordHeader:
xl_tot_len uint32 — total length of the record
xl_xid TransactionId — transaction that generated this record
xl_prev XLogRecPtr — LSN of the previous record (for backward traversal)
xl_info uint8 — flags and resource manager subtype
xl_rmid RmgrId — resource manager ID (identifies record type)
xl_crc pg_crc32c — CRC of the record
WALRecordData:
[variable-length payload — format depends on resource manager]
The resource manager (xl_rmid) identifies what kind of operation this record describes. Key resource managers:
| rmid | Name | What it logs |
|---|---|---|
| 0 | XLOG | Checkpoint records, backup labels |
| 10 | Heap | INSERT, UPDATE, DELETE, HOT update |
| 11 | Heap2 | VACUUM, FREEZE, visibility map updates |
| 1 | Transaction | COMMIT, ROLLBACK, PREPARE |
| 2 | Storage | File creation/deletion |
| 3 | CLOG | Transaction status page updates |
| 6 | Btree | B-tree splits, page deletions |
What an INSERT Actually Writes to WAL
When you insert a row, the WAL record for the heap contains:
- The full tuple data (for non-HOT inserts) — enough to reconstruct the row on recovery
- The target page and offset — where the tuple was placed in the heap
- The transaction ID — for visibility tracking
For an UPDATE, WAL contains:
- A record marking the old tuple as dead (with its page/offset)
- A record with the new tuple data (full tuple for non-HOT, just the changed columns for HOT)
For a DELETE, WAL contains:
- A record marking the old tuple's
xmaxas set
This is why updates are expensive in Postgres — they generate more WAL than inserts. Every update is a delete + insert at the WAL level.
sql-- Measure WAL generated by a single UPDATE SELECT pg_current_wal_lsn() AS before \gset UPDATE transactions SET status = 'confirmed' WHERE id = 1; SELECT pg_size_pretty( pg_wal_lsn_diff(pg_current_wal_lsn(), :'before') ) AS wal_generated; -- Typical result for a row update: 200–400 bytes of WAL
The Write Path: From Memory to Disk
Understanding exactly when data moves from memory to disk is critical for reasoning about both performance and durability.
Shared Buffers and the WAL Buffer
Postgres has two separate in-memory write buffers:
shared_buffers — the main page cache. When you modify a page (insert, update, delete), the modified page lives here until a background writer or checkpoint flushes it to the heap file on disk. A dirty page in shared_buffers is a performance optimization — it defers expensive random writes.
WAL buffers (wal_buffers, default 16MB or 1/32 of shared_buffers) — a circular buffer in shared memory where WAL records are accumulated before being written to the WAL segment files. WAL records flow from the WAL buffer to disk much more frequently than heap pages.
When WAL Is Flushed
WAL is flushed to disk (fsync'd) at these moments:
-
Transaction commit — by default (
synchronous_commit = on), WAL is flushed before the commit acknowledgement is sent to the client. This is what makes committed transactions durable. -
WAL buffer fills — if the circular WAL buffer fills up before a commit, the WAL writer flushes it to avoid stalling.
-
The WAL writer background process — wakes up every
wal_writer_delaymilliseconds (default 200ms) and flushes any unflushed WAL. This bounds the exposure window for asynchronous commit. -
Checkpoint — all WAL through the checkpoint LSN is guaranteed to be on disk.
The WAL Writer Process
sql-- Monitor WAL writer activity SELECT buffers_checkpoint, buffers_clean, buffers_backend, buffers_backend_fsync, buffers_alloc FROM pg_stat_bgwriter;
buffers_backend_fsync being non-zero means backends are having to do their own fsyncs because the WAL writer is falling behind. This is a performance warning sign.
Checkpoints: Bounding Recovery Time
WAL solves durability, but it creates a new problem: if the database crashes and needs to replay WAL for recovery, how far back does it need to go? In theory, the entire WAL history since the database was created.
Checkpoints solve this by periodically guaranteeing that all dirty heap pages have been flushed to disk. After a checkpoint completes, crash recovery only needs to replay WAL from the checkpoint LSN forward.
What Happens During a Checkpoint
- The checkpointer identifies all dirty pages in
shared_buffers. - Dirty pages are flushed to disk — this is the expensive part. The checkpointer spreads this work over time using
checkpoint_completion_targetto avoid a burst of I/O. - The checkpoint record is written to WAL — recording the checkpoint LSN.
- The WAL is flushed — ensuring the checkpoint record is durable.
After a checkpoint:
- All heap pages that were dirty before the checkpoint start are now on disk
- Recovery can safely start from the checkpoint LSN instead of the beginning of WAL
Checkpoint Configuration
ini# How often to checkpoint (time-based) checkpoint_timeout = 5min # default; increase for write-heavy workloads # How often to checkpoint (WAL-size-based) max_wal_size = 1GB # trigger checkpoint if WAL exceeds this # How to spread checkpoint I/O checkpoint_completion_target = 0.9 # use 90% of checkpoint_timeout interval for I/O # Minimum WAL to keep (even after checkpoint) min_wal_size = 80MB
The checkpoint_completion_target trade-off:
Setting this to 0.9 means the checkpointer spreads its I/O across 90% of the checkpoint_timeout interval. At checkpoint_timeout = 5min, that's 270 seconds of smooth I/O. This reduces I/O spikes but means at any point, up to 5 minutes of WAL must be replayed on crash.
Setting it to 0.1 means the checkpointer writes aggressively at the start — high I/O spike, but recovery is faster.
For most production systems: checkpoint_completion_target = 0.9 and checkpoint_timeout = 15min with max_wal_size = 4GB is a reasonable starting point.
Detecting Checkpoint Problems
sql-- Checkpoints happening too frequently = WAL being generated faster than max_wal_size SELECT checkpoints_timed, checkpoints_req, -- req = triggered by WAL size, not time checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint FROM pg_stat_bgwriter;
If checkpoints_req is much higher than checkpoints_timed, your max_wal_size is too small for your write rate. Each requested checkpoint means the system generated 1GB (or whatever max_wal_size is) of WAL before the timeout elapsed. Increase max_wal_size.
sql-- Also check the PostgreSQL log for: -- LOG: checkpoint starting: wal -- This means WAL-triggered, not time-triggered
synchronous_commit: Durability vs Latency
synchronous_commit controls when a COMMIT returns to the client. This is the most important durability knob in Postgres, and it is frequently misunderstood.
| Setting | Behavior | Durability Risk |
|---|---|---|
on (default) | WAL flushed to primary disk before COMMIT returns | None |
remote_write | WAL written (not flushed) to standby before COMMIT returns | Crash of standby before its flush loses data |
remote_apply | WAL applied on standby before COMMIT returns | None — standby has applied data |
local | WAL flushed to primary only, regardless of standby | Standby can lag |
off | COMMIT returns without waiting for WAL flush | Up to wal_writer_delay (200ms) of committed data lost on crash |
When to Use synchronous_commit = off
The risk is small and bounded: if the server crashes, transactions committed in the last wal_writer_delay milliseconds (200ms by default) may be lost. Postgres will not lie to you about this — the client receives COMMIT and the transaction may genuinely be lost.
This is acceptable for:
- Session-level analytics queries that write intermediate results
- Audit log inserts where losing a few records on crash is acceptable
- High-throughput event ingestion where the business can tolerate small data loss
It is not acceptable for:
- Financial transactions
- Any data where "committed means durable" is a contract with the user
sql-- Enable asynchronous commit for a single session SET synchronous_commit = off; -- Or for a specific transaction BEGIN; SET LOCAL synchronous_commit = off; INSERT INTO event_log ...; COMMIT; -- returns immediately, WAL flushed within 200ms
The latency benefit is significant: a synchronous commit that requires an fsync typically takes 1–5ms on SSDs. An asynchronous commit returns in microseconds. For high-frequency writes, this is a 100–1000x latency improvement.
WAL and Full Page Writes
There is a subtlety in WAL that causes significant write amplification on systems with page sizes different from the disk's atomic write unit.
When a page is first modified after a checkpoint, Postgres writes the entire 8KB page into the WAL record, not just the changed bytes. This is called a Full Page Write (FPW).
Why Full Page Writes Exist
Modern disks write in 512-byte or 4096-byte sectors. If Postgres's 8KB page is partially written when the system crashes (a "torn page"), the partially-written page is corrupted. The WAL record of just the changed bytes cannot reconstruct a valid page from a torn one.
Full page writes ensure that WAL contains a complete, valid copy of every page that was modified after the last checkpoint. If a torn page is found during recovery, the full page image from WAL overwrites it completely.
ini# Full page writes are on by default — do not disable full_page_writes = on
The Write Amplification Implication
After every checkpoint, the first write to each page generates a WAL record that is 8KB + header overhead instead of just the changed bytes.
On a write-heavy workload with frequent checkpoints:
- If
checkpoint_timeout = 1minand you have 10,000 dirty pages - Every minute, those 10,000 pages get full page writes after the checkpoint
- That's 80MB of WAL per minute from FPWs alone, regardless of actual change size
This is why aggressive checkpoint tuning increases WAL volume. A checkpoint every minute means every page gets a full page write every minute. A checkpoint every 15 minutes means each page gets a full page write every 15 minutes — 15x less FPW overhead.
sql-- Measure FPW overhead SELECT wal_records, wal_fpi, -- full page images written wal_bytes, pg_size_pretty(wal_bytes) AS wal_size FROM pg_stat_wal; -- High wal_fpi relative to wal_records = lots of FPW overhead -- Usually caused by frequent checkpoints or large dirty working set
WAL Retention and pg_wal Sizing
After a checkpoint, segments that are no longer needed for recovery can be recycled. Postgres keeps enough WAL segments to cover the range between the oldest replication slot's confirmed flush LSN and the current LSN.
What Keeps WAL Around
Replication slots — if a standby falls behind and has a replication slot, Postgres retains all WAL since that slot's restart_lsn. A lagging or disconnected standby with a slot will cause pg_wal/ to grow without bound.
sql-- Check replication slot lag SELECT slot_name, slot_type, active, pg_size_pretty( pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) ) AS retained_wal_size FROM pg_replication_slots;
If retained_wal_size is growing and the slot is not active, you have a disconnected standby holding WAL hostage. This will eventually fill your disk.
wal_keep_size — the minimum amount of WAL to keep, regardless of slots or replication need:
ini# Keep at least 1GB of WAL even after it's no longer needed for recovery wal_keep_size = 1GB
This is useful when standbys connect without replication slots (they can catch up from retained WAL rather than needing a base backup).
Sizing pg_wal
Recommended pg_wal partition size for production:
pg_wal size = max_wal_size * 2 + wal_keep_size + replication lag buffer
For a system with max_wal_size = 4GB and wal_keep_size = 2GB, allocate at least 12–16GB for pg_wal. Running out of space in pg_wal causes Postgres to PANIC and shut down — there is no graceful degradation.
Streaming Replication: WAL as the Replication Protocol
Postgres replication is built directly on WAL. There is no separate change data capture layer, no triggers, no logical change format at the physical replication level. The standby receives the WAL stream and replays it.
Physical Replication Architecture
Primary:
Backend processes → WAL buffer → WAL writer → pg_wal segments
↓
WAL sender process
Standby:
WAL receiver process → pg_wal segments → startup process (recovery) → heap files
The WAL sender on the primary reads WAL segments and streams them to connected standbys. The WAL receiver on the standby writes received WAL to its local pg_wal/ and signals the startup process to apply it.
The startup process on the standby is running in recovery mode — it is perpetually replaying WAL, exactly as if it were recovering from a crash. The only difference is it never finishes: it keeps waiting for more WAL to arrive.
Replication Lag
Replication lag has multiple components, each measurable independently:
sql-- On the primary: view all connected standbys SELECT application_name, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS total_lag, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn)) AS network_lag, pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS flush_lag, pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn)) AS apply_lag, write_lag, flush_lag, replay_lag FROM pg_stat_replication;
- Network lag (
sent - write): WAL sent but not yet written to standby's disk. Network bandwidth bottleneck. - Flush lag (
write - flush): Written to standby disk but not fsynced. Standby I/O bottleneck. - Apply lag (
flush - replay): Fsynced but not yet applied to standby's heap. CPU/disk bottleneck on replay.
Apply lag is the most dangerous for read-after-write workloads on standbys — a query routed to the standby may see stale data by exactly apply_lag worth of transactions.
Synchronous Replication
By default, replication is asynchronous. The primary does not wait for the standby before returning COMMIT. To require standby confirmation:
ini# On primary — require at least one standby to confirm before commit synchronous_standby_names = 'standby1' # What "confirm" means (see synchronous_commit table above) synchronous_commit = remote_apply # standby has applied the transaction
The cost: every commit now has latency equal to the network round-trip + standby flush/apply time. For a standby 5ms away, commit latency increases by at least 5ms. For a cross-region standby, this can be 50–100ms per commit — catastrophic for transactional workloads.
The correct pattern for most production systems: asynchronous replication for normal traffic, synchronous commit on a per-transaction basis for critical writes:
sql-- Normal transaction — async BEGIN; INSERT INTO events ...; COMMIT; -- Critical transaction — sync BEGIN; SET LOCAL synchronous_commit = remote_apply; UPDATE account_balances SET amount = amount - 100 WHERE id = 1; COMMIT; -- waits for standby confirmation
WAL-Level Settings and wal_level
wal_level controls how much information is written to WAL. It has three values that matter in practice:
iniwal_level = minimal # Minimum for crash recovery. No replication possible. wal_level = replica # Default. Enables streaming replication. wal_level = logical # Enables logical replication and logical decoding.
logical mode writes more WAL because it includes enough information for logical decoding (reconstructing row-level changes for CDC consumers like Debezium). The overhead is typically 10–30% more WAL on write-heavy workloads.
Logical Replication vs Physical Replication
| Physical | Logical | |
|---|---|---|
| What is replicated | Raw WAL (page changes) | Row-level changes |
| Schema compatibility | Standby must be identical schema | Can replicate to different schema/version |
| Replication granularity | Entire cluster | Individual tables |
| Overhead | Lower | Higher (WAL decoding) |
| Use case | HA standby | CDC, cross-version upgrades, partial replication |
Logical replication uses replication slots to track consumer position. The slot ensures WAL is retained until the consumer confirms it has processed everything. A stalled logical replication consumer is a pg_wal filling time bomb.
WAL and Write Amplification: The Full Picture
Every write in Postgres touches multiple locations. Understanding the full write amplification helps you reason about storage I/O and WAL volume.
For a single UPDATE to one row:
1. WAL buffer (memory):
- Full page write record (~8192 bytes) if first modification post-checkpoint
- Update record (~200-400 bytes)
2. WAL file (disk):
- Same data, flushed on commit
3. shared_buffers (memory):
- Old tuple marked dead (xmax set)
- New tuple written to available space
4. Heap file (disk, deferred):
- Eventually flushed by background writer or checkpoint
5. If indexes exist:
- Index page modified in shared_buffers
- Full page write record for index page (if first modification post-checkpoint)
- Eventually flushed to index file on disk
6. Visibility Map (if applicable):
- VM bit cleared (page is no longer all-visible)
- WAL record for VM update
7. pg_clog / pg_xact:
- Transaction status updated on commit
A single row update can generate 3–5 WAL records and touch 4–8 distinct on-disk locations. On a table with 5 indexes, an update touches even more. This is the true cost of an UPDATE — not the query execution time, but the write amplification cascade it triggers.
Measuring Actual WAL Generation Per Query
sql-- Use EXPLAIN (WAL) to see WAL generated by a query EXPLAIN (ANALYZE, WAL, BUFFERS) UPDATE transactions SET status = 'confirmed' WHERE block_height = 18500050; -- Output includes: -- WAL: records=3 fpi=2 bytes=18432 -- records = number of WAL records written -- fpi = full page images (8KB each) -- bytes = total WAL bytes generated
fpi=2 in this output means two full page images were written — 16KB of WAL just for page images, plus the actual change data. If this query runs 1,000 times per second, that's 16MB/s of WAL from FPWs alone for this one query pattern.
Diagnosing WAL Problems in Production
Symptom: pg_wal Growing Without Bound
sql-- Step 1: Check replication slots SELECT slot_name, active, pg_size_pretty( pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) ) AS retained FROM pg_replication_slots; -- Step 2: Check standby lag SELECT application_name, replay_lag FROM pg_stat_replication; -- Step 3: If a slot is inactive and you can afford to drop it: SELECT pg_drop_replication_slot('stalled_slot_name');
Symptom: Frequent Checkpoint Warnings in Logs
LOG: checkpoints are occurring too frequently (9 seconds apart)
HINT: Consider increasing max_wal_size.
sql-- Check WAL generation rate SELECT pg_size_pretty(wal_bytes) AS total_wal, wal_records, wal_fpi, stats_reset FROM pg_stat_wal; -- Calculate WAL rate since stats reset SELECT pg_size_pretty( wal_bytes / EXTRACT(EPOCH FROM (now() - stats_reset)) ) AS wal_bytes_per_second FROM pg_stat_wal;
If WAL generation is 100MB/s and max_wal_size = 1GB, checkpoints will trigger every 10 seconds. Either increase max_wal_size or reduce your write rate (fewer updates, batch writes, use HOT updates).
Symptom: High Replication Lag Spike During Checkpoint
During a checkpoint, the I/O load on the primary increases as dirty pages are flushed. This I/O contention can delay WAL sender from keeping up with new WAL, causing lag spikes on standbys.
Fix: tune checkpoint_completion_target = 0.9 to spread I/O, and use effective_io_concurrency to allow I/O prefetching during recovery on the standby.
Symptom: Standby Apply Lag Grows Under Write Load
Apply lag growing means the standby's recovery process can't apply WAL as fast as it arrives. This is CPU-bound on the standby (single-threaded WAL apply).
Postgres 14+ introduced parallel WAL apply on standbys:
ini# On standby recovery_min_apply_delay = 0 max_parallel_apply_workers_per_subscription = 4 # for logical replication
For physical replication, parallel apply is not available until Postgres 16+ in some configurations. The fix is usually to ensure the standby has faster single-core performance or reduce write load on the primary.
Production Tuning Reference
ini# WAL generation wal_level = replica # or logical if CDC is needed full_page_writes = on # never disable # WAL buffering wal_buffers = 64MB # increase from default for write-heavy workloads wal_writer_delay = 200ms # default is fine # Checkpoint behavior checkpoint_timeout = 15min # reduce checkpoint frequency max_wal_size = 4GB # for high-write systems, go higher min_wal_size = 1GB checkpoint_completion_target = 0.9 # Durability synchronous_commit = on # never disable globally # Use SET LOCAL synchronous_commit = off for specific high-freq writes # Replication retention wal_keep_size = 2GB # safety buffer for standbys without slots # Archive (if using PITR) archive_mode = on archive_command = 'cp %p /mnt/wal_archive/%f' archive_timeout = 60 # force segment switch after 60s of inactivity
The Production Incident: WAL Archiving Stall Causing 6-Hour Replica Lag
Context: A blockchain indexer processing ~2,000 TPS with a primary and two physical standbys.
What happened:
An archive_command was configured to copy WAL segments to an NFS mount. During a network partition, the NFS mount became unresponsive. The archive_command hung — it did not fail, it just never returned.
Postgres was waiting for the archive command to complete before allowing WAL segment recycling. But with the archive command hanging, no segments could be recycled. pg_wal/ grew until it hit the disk limit, at which point Postgres paused WAL writing.
Paused WAL writing means no commits can complete. The primary appeared to hang. All application writes stalled.
The standbys: they had received WAL up to the pause point and were applying it, but since the primary was paused, their apply caught up quickly — and then they were waiting for new WAL. Replication appeared healthy. The lag was invisible until the primary recovered.
After the partition healed: The archive command eventually completed or timed out. Postgres resumed WAL writing. But the WAL sender had to catch up standbys on all the writes that piled up during the stall — leading to 6 hours of visible replication lag as the standbys processed the backlog.
The fixes:
bash# 1. Set a timeout on the archive command archive_command = 'timeout 30 cp %p /mnt/wal_archive/%f' # 2. Monitor archive status
sql-- Check for archive failures SELECT archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset FROM pg_stat_archiver;
ini# 3. Set a maximum archive wait time # If archive fails, Postgres retries — add alerting on last_failed_time
The lesson: WAL archiving stalls are silent until they're catastrophic. pg_stat_archiver.failed_count incrementing without alerting is a ticking clock toward a primary outage.
Summary
| Concept | Key Takeaway |
|---|---|
| WAL purpose | Sequential writes for durability; heap pages written lazily in background |
| Commit durability | WAL flushed to disk before COMMIT returns (by default). Heap page can wait. |
| Full page writes | First write to each page post-checkpoint writes full 8KB to WAL — major write amplification driver |
| Checkpoints | Bound recovery time by flushing all dirty pages; too-frequent checkpoints = excessive FPW overhead |
synchronous_commit | off trades up to 200ms of durability for commit latency. Acceptable for some workloads. |
| Write amplification | A single UPDATE touches WAL buffer, WAL file, heap buffer, heap file, index buffer, index file, VM |
| Streaming replication | Standby receives WAL stream and replays it in perpetual recovery mode |
| Replication lag | Three components: network lag, flush lag, apply lag — measure each independently |
| Replication slots | Retain WAL for consumers; stalled slots grow pg_wal without bound |
| WAL archiving | archive_command must have a timeout; stalls silently cause disk exhaustion |
WAL is the foundation everything else in Postgres is built on — replication, PITR, crash recovery, and even some of the MVCC mechanics you saw in Module 2. Module 4 goes into the process that runs on top of MVCC and WAL to keep your database healthy: Autovacuum.
Next: Module 4 — Autovacuum: The Process Everyone Misconfigures →