The ALTER TABLE lock matrix, CREATE INDEX CONCURRENTLY, NOT VALID + VALIDATE CONSTRAINT two-phase pattern, adding NOT NULL columns safely on 500M-row tables, column rename via view aliasing, lock_timeout + retry in migration tools, and the exact migration sequences that have taken production sites down.
P-10 — Zero-Downtime Schema Migrations
The most dangerous moment in any PostgreSQL application's life is not when the database is under load — it's when someone runs an ALTER TABLE on a 500-million-row table during business hours. The table locks. Queries queue. The queue hits connection limits. The application goes down. This module is the playbook for never letting that happen.
The ALTER TABLE Lock Matrix
Every schema change acquires a lock. Most acquire AccessExclusiveLock — the most restrictive lock, which blocks ALL reads and writes for the duration. Know what locks what:
| Operation | Lock Level | Blocks reads? | Blocks writes? | Safe under load? |
|---|---|---|---|---|
CREATE INDEX CONCURRENTLY | ShareUpdateExclusiveLock | No | No | Yes |
CREATE INDEX | ShareLock | No | Yes | No |
ADD COLUMN (nullable, no default) | AccessExclusiveLock | Yes | Yes | Brief |
ADD COLUMN ... DEFAULT ... (PG11+) | AccessExclusiveLock | Yes | Yes | Brief |
ADD COLUMN ... NOT NULL (no default) | AccessExclusiveLock | Yes | Yes | No — rewrites table |
ALTER COLUMN TYPE | AccessExclusiveLock | Yes | Yes | No — rewrites table |
ADD CONSTRAINT ... NOT VALID | ShareUpdateExclusiveLock | No | No | Yes |
VALIDATE CONSTRAINT | ShareUpdateExclusiveLock | No | No | Yes |
DROP COLUMN | AccessExclusiveLock | Yes | Yes | Brief (mark only) |
RENAME COLUMN | AccessExclusiveLock | Yes | Yes | No |
RENAME TABLE | AccessExclusiveLock | Yes | Yes | No |
"Brief" means the lock duration is proportional to the table's metadata size, not its row count. Still dangerous under high concurrency because queued queries accumulate. A 50ms metadata-only lock on a table receiving 2,000 queries/second can queue 100 requests before it releases.
CREATE INDEX CONCURRENTLY — Safe Indexing on Live Tables
Standard CREATE INDEX acquires a ShareLock that blocks all writes for the entire duration. On a 100M row table, that's 5–30 minutes of blocked writes.
CREATE INDEX CONCURRENTLY builds the index without blocking:
sql-- WRONG — blocks all writes for minutes on large tables CREATE INDEX idx_orders_user_id ON orders(user_id); -- CORRECT — builds the index without blocking writes CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
The mechanics: CONCURRENTLY makes two passes over the table. Between passes, it waits for any transactions that started before the first pass to complete. Total time is 2–3x longer than standard CREATE INDEX, but zero write blocking.
The failure case: If CREATE INDEX CONCURRENTLY fails mid-build (a unique constraint violation discovered during the build, a statement timeout, a killed connection), it leaves an INVALID index:
sql-- Find INVALID indexes SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_indexes JOIN pg_class ON relname = indexname JOIN pg_index ON indexrelid = pg_class.oid WHERE indisvalid = false;
An INVALID index is invisible to the query planner but still accumulates write overhead — every INSERT, UPDATE, DELETE on the table touches the INVALID index. Drop it immediately:
sqlDROP INDEX CONCURRENTLY idx_orders_user_id; -- drop also non-blocking CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
Always check for INVALID indexes after your migration pipeline runs. Make it part of your post-migration health check.
Adding NOT NULL Columns Safely
Pre-PostgreSQL 11: Adding any column with a default value rewrites the entire table — every row gets the new column with the default value written to disk. On a 500M row table, this holds AccessExclusiveLock for hours.
PostgreSQL 11+: Adding a column with a NOT NULL DEFAULT that is a constant (not a function call, not now(), not gen_random_uuid()) uses metadata-only storage. The default is stored in the table's catalog, not in each row. The ALTER is instant regardless of table size.
sql-- PG11+ with a constant default — instant, no table rewrite ALTER TABLE orders ADD COLUMN processed boolean NOT NULL DEFAULT false; -- PG11+ with a non-constant default — still rewrites the table ALTER TABLE orders ADD COLUMN created_at timestamptz NOT NULL DEFAULT now(); -- Dangerous on large tables. now() is volatile, so PG must write the value to each row. -- Safe pattern for volatile defaults: add nullable first, backfill, then constrain ALTER TABLE orders ADD COLUMN created_at timestamptz; -- instant UPDATE orders SET created_at = NOW() WHERE created_at IS NULL; -- batch backfill ALTER TABLE orders ALTER COLUMN created_at SET NOT NULL; -- then constrain
The three-step pattern acquires AccessExclusiveLock twice — but both times for only metadata operations. The expensive work (scanning and writing rows) happens in the UPDATE, which holds no table lock beyond normal row-level locks.
For very large tables, batch the UPDATE:
sql-- Batch backfill to avoid long-running transactions and lock contention DO $$ DECLARE batch_size INT := 10000; last_id BIGINT := 0; max_id BIGINT; BEGIN SELECT MAX(id) INTO max_id FROM orders; WHILE last_id < max_id LOOP UPDATE orders SET created_at = NOW() WHERE id > last_id AND id <= last_id + batch_size AND created_at IS NULL; last_id := last_id + batch_size; PERFORM pg_sleep(0.01); -- 10ms pause between batches to let autovacuum keep up END LOOP; END $$;
The NOT VALID + VALIDATE CONSTRAINT Two-Phase Pattern
Adding a foreign key or check constraint with plain ADD CONSTRAINT scans every existing row to verify the constraint holds. On large tables, this holds ShareRowExclusiveLock (blocks concurrent inserts and updates) for minutes.
The two-phase approach:
Phase 1: Add the constraint as NOT VALID — no historical rows are checked, only rows written after this point are validated. This is fast regardless of table size.
Phase 2: Validate separately — VALIDATE CONSTRAINT checks existing rows using ShareUpdateExclusiveLock, which allows concurrent reads and most writes.
sql-- Phase 1: add constraint without scanning historical rows (instant) ALTER TABLE orders ADD CONSTRAINT fk_orders_users FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID; -- Phase 2: validate existing rows (non-blocking for reads, can run during business hours) -- This takes time proportional to table size, but doesn't block reads ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_users;
Why does Phase 2 work? VALIDATE CONSTRAINT uses ShareUpdateExclusiveLock which is compatible with concurrent reads and DML. It does a full table scan but doesn't block your application. Run it during off-peak hours if the table is very large.
The same pattern applies to CHECK constraints:
sql-- Add constraint without scanning existing rows ALTER TABLE products ADD CONSTRAINT chk_price_positive CHECK (price > 0) NOT VALID; -- Validate later ALTER TABLE products VALIDATE CONSTRAINT chk_price_positive;
After Phase 2 completes, the constraint is fully enforced for all rows — both existing and future.
Renaming Columns Safely — The Expand/Contract Pattern
RENAME COLUMN acquires AccessExclusiveLock and breaks any code that references the old column name. Even within a maintenance window, you have to deploy application code atomically with the rename — impossible without downtime unless your deployment is instantaneous.
The zero-downtime rename uses the expand/contract pattern: four deployment steps across days or weeks.
Step 1: Add the new column (nullable)
sqlALTER TABLE users ADD COLUMN full_name text;
This is instant. The old columns (first_name, last_name) are untouched. Application code continues to work.
Step 2: Write to both columns in application code (deploy)
javascript// Your application now writes to both old and new columns await db.query( 'INSERT INTO users (first_name, last_name, full_name) VALUES ($1, $2, $3)', [firstName, lastName, `${firstName} ${lastName}`] );
Deploy this change. All new rows now have both the old columns and the new column populated.
Step 3: Backfill the new column for existing rows
sqlUPDATE users SET full_name = first_name || ' ' || last_name WHERE full_name IS NULL;
Run during off-peak hours. For large tables, batch as shown earlier.
Step 4: Add NOT NULL constraint (after backfill completes)
sqlALTER TABLE users ALTER COLUMN full_name SET NOT NULL;
Step 5: Update application to read from new column only (deploy)
Remove all references to first_name and last_name from your read paths. Leave the write path touching all three columns until the next step.
Step 6: Remove the old columns (deploy)
sqlALTER TABLE users DROP COLUMN first_name, DROP COLUMN last_name;
Four deployment steps. No downtime. No application-visible lock during any step except the brief metadata-only ALTER TABLE operations.
The same pattern applies to table renames. Never RENAME TABLE in production. Instead, create a view with the old name that points to the new table, migrate code to use the new name, then drop the view.
lock_timeout — Your Safety Net
Every migration that touches a large table should set lock_timeout to prevent queue accumulation:
sql-- Fail immediately if a lock cannot be acquired within 2 seconds -- rather than blocking indefinitely and causing query queue buildup SET lock_timeout = '2s'; ALTER TABLE orders ADD COLUMN status text NOT NULL DEFAULT 'pending';
The queuing problem in detail: Without lock_timeout, a migration waiting for a lock will queue behind existing transactions. PostgreSQL's lock queue is FIFO — every subsequent query that tries to touch that table also queues behind the migration. A migration waiting 30 seconds for a lock can accumulate hundreds of queued connections, exhausting max_connections and bringing down the application. The migration itself may not even be blocking — it's the queue it creates.
With lock_timeout = '2s', the migration fails rather than queue. You retry during a quieter window. The application stays up.
In migration tools, enforce this as a connection-level default:
sql-- Add to every migration connection via a beforeMigrate callback or init SQL SET lock_timeout = '3s'; SET statement_timeout = '60s';
yaml# Flyway: flyway.conf flyway.initSql=SET lock_timeout='3s'; SET statement_timeout='60s';
statement_timeout is your backup: if the migration itself runs longer than expected (a table rewrite on a table you didn't know was 200GB), it kills the migration rather than holding locks indefinitely.
Column Type Changes Without Table Rewrites
ALTER COLUMN TYPE rewrites the entire table. For large tables, this is an hours-long AccessExclusiveLock. The workaround: new column, backfill, swap.
Pattern: New column + backfill + swap
sql-- Example: changing varchar(50) to text -- Step 1: add new column ALTER TABLE products ADD COLUMN description_new text; -- Step 2: backfill (batch for large tables) UPDATE products SET description_new = description; -- Step 3: swap names in a single transaction (two brief metadata locks) BEGIN; ALTER TABLE products RENAME COLUMN description TO description_old; ALTER TABLE products RENAME COLUMN description_new TO description; COMMIT; -- Step 4: validate nothing broke, then drop the old column ALTER TABLE products DROP COLUMN description_old;
The rename transaction holds AccessExclusiveLock for both renames, but since they're metadata-only operations on the catalog, the lock is held for milliseconds — not proportional to table size.
Pattern: varchar length increases are free
Increasing varchar(N) length in PostgreSQL is a metadata-only operation — no table rewrite:
sql-- Increasing varchar length — instant, no table rewrite ALTER TABLE users ALTER COLUMN email TYPE varchar(255); -- was varchar(100), now varchar(255) — PostgreSQL only updates pg_attribute
Decreasing length requires a constraint check (full table scan) — use the NOT VALID pattern. Changing to a fundamentally different type (e.g., integer to text) always rewrites the table — use the new column + backfill pattern.
Migration Tool Configuration for Safety
Enforce safe practices at the tooling level so individual developers don't have to remember.
yaml# Flyway flyway.conf flyway.lockRetryCount=3 flyway.connectRetries=5 # Enforce timeouts on every migration connection flyway.initSql=SET lock_timeout='3s'; SET statement_timeout='120s';
typescript// Prisma: use pgbouncer=true in connection string for pooled environments, // and shadow database for migration testing // prisma/schema.prisma datasource db { provider = "postgresql" url = env("DATABASE_URL") shadowDatabaseUrl = env("SHADOW_DATABASE_URL") }
javascript// node-postgres: set lock_timeout on the migration connection const migrationClient = new Client({ connectionString: process.env.DATABASE_URL }); await migrationClient.connect(); await migrationClient.query("SET lock_timeout = '3s'"); await migrationClient.query("SET statement_timeout = '120s'"); // run migrations...
For CI/CD pipelines, test migrations against a production-sized copy of the database before applying to production. pg_dump --schema-only + restore + pg_dump --data-only of a sample is often sufficient to catch "this alter will rewrite the table" surprises.
Detecting Lock Waits During Migrations
While a migration is running, monitor pg_stat_activity in a separate session to catch lock pile-ups early:
sql-- Are any queries waiting for a lock? SELECT pid, now() - query_start AS wait_duration, state, wait_event_type, wait_event, left(query, 80) AS query FROM pg_stat_activity WHERE wait_event_type = 'Lock' ORDER BY wait_duration DESC;
sql-- What is holding the lock that others are waiting for? SELECT blocking.pid AS blocking_pid, blocking.query AS blocking_query, blocked.pid AS blocked_pid, blocked.query AS blocked_query, now() - blocked.query_start AS blocked_duration FROM pg_stat_activity blocked JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid)) WHERE blocked.wait_event_type = 'Lock';
If you see this filling up during a migration, the migration is holding a lock that other queries are queuing behind. Kill the migration and investigate before retrying.
The Migration Checklist
Before running any migration on a production table with more than 1 million rows:
[ ] Identify the lock type and duration (check the ALTER TABLE lock matrix)
[ ] Set lock_timeout and statement_timeout on the migration connection
[ ] Test on a production-sized copy first — surprises at 2am are unacceptable
[ ] Schedule for the lowest-traffic window your SLAs allow
[ ] Disable application autoscaling during migration (new instances connecting = more lock contention)
[ ] Have a tested rollback script ready before you start
[ ] Monitor pg_stat_activity in a separate session during execution
[ ] Alert on connection count spike (cl_waiting in PgBouncer, or pg_stat_activity state='idle in transaction')
[ ] For CONCURRENTLY operations: check pg_indexes for indisvalid = false after completion
[ ] For NOT VALID constraints: schedule the VALIDATE CONSTRAINT step separately
The 3am call you're trying to avoid is the one where the migration ran fine in staging (10k rows), then blocked production (500M rows) for 40 minutes while the on-call engineer frantically looked for the pg_cancel_backend command. This checklist exists because that call happens.
Summary
| Pattern | When to use | Lock held |
|---|---|---|
CREATE INDEX CONCURRENTLY | Adding any index to a live table | ShareUpdateExclusiveLock (non-blocking) |
ADD COLUMN ... DEFAULT constant (PG11+) | Adding column with a fixed default | AccessExclusiveLock, milliseconds |
| Add nullable + backfill + SET NOT NULL | Adding column with volatile default | Metadata locks only; backfill is row-level |
NOT VALID + VALIDATE CONSTRAINT | Adding FK or check constraint | Phase 1: instant; Phase 2: ShareUpdateExclusiveLock |
| Expand/contract (new col + backfill + rename + drop) | Renaming columns or tables | Metadata locks only |
| New col + backfill + swap | Changing column type | Metadata locks only |
lock_timeout | Every migration touching large tables | Prevents queue accumulation |
Zero-downtime migrations are not about clever hacks — they are about understanding exactly what PostgreSQL locks when, and structuring your changes to never hold an exclusive lock longer than milliseconds.
Next: Module 17 — Logical Replication and CDC Pipelines →