← Courses
🐘

Documentation Course

PostgreSQL In-Depth

From First Query to Production Architecture

Start Reading

A complete A-Z PostgreSQL curriculum — from absolute basics for beginners through production patterns for mid-level engineers to the deep internals that only senior engineers ever learn. Built from years of running PostgreSQL at 20–30 TB scale on a live blockchain indexer.

35 modulesBeginner to AdvancedWritten / Documentation-style35 of 35 modules live

How to use this course

This course works as both a sequential read and a standalone reference. Read front-to-back to build a complete mental model of Postgres under production load. Or jump to any module when you hit a specific incident — VACUUM bloat, a lock queue outage, a query regression after a data load.

Total reading time

~15 hrs

across 35 modules

Built from

20–30 TB

live blockchain indexer

Prerequisite

Zero SQL to production internals

Beginner through senior engineer

Phase 2 — Practitioner

Production patterns · Real application architecture

10 modules
P-1
Advanced SQL — The Patterns You Will Use Every Week
CTEs, window functions, upserts, subqueries, UNION — the SQL that separates proficient engineers from beginners.
P-2
Indexes — When and How to Add Them
The B-tree index for practitioners, EXPLAIN basics, composite and partial indexes, and the write cost trade-off.
P-3
Transactions and ACID in Practice
BEGIN, COMMIT, ROLLBACK, isolation levels, and safe atomic operations — what every production application must understand.
P-4
Schema Design for Real Applications
Normalisation, denormalisation trade-offs, correct data types for money and time, soft deletes, and schema migration tools.
P-5
JSON and JSONB — Working With Semi-Structured Data
JSONB vs JSON, containment operators, GIN indexes, and when JSONB is the right tool vs. a schema design shortcut.
P-6
Authentication, Row-Level Security, and Access Control
Roles, privileges, Row-Level Security policies, and multi-tenant data isolation — the production access model.
P-7
Full-Text Search
tsvector, tsquery, GIN indexes, relevance ranking, and generated tsvector columns — search without Elasticsearch.
P-8
Performance Tuning for Application Engineers
EXPLAIN ANALYZE for practitioners, key config parameters, N+1 queries, PgBouncer basics, and slow query logging.
P-9
External Services, Caching Layers, and Deployment
ORMs vs. raw SQL, read replicas, materialised views, managed databases, and a production-ready deployment checklist.
P-10
Zero-Downtime Schema Migrations
The ALTER TABLE lock matrix, CREATE INDEX CONCURRENTLY, NOT VALID + VALIDATE CONSTRAINT two-phase pattern, adding NOT NULL columns safely on 500M-row tables, column rename via view aliasing, lock_timeout + retry in migration tools, and the exact migration sequences that have taken production sites down.

Phase 3 — Architect

Engine internals · High-throughput systems · Distributed architecture

18 modules
0
Before You Proceed: Mental Model Reset
Why most engineers have a shallow model of Postgres — and what it costs them in production.
1
The Storage Engine: Pages, Heaps, and the True Cost of a Row
Where your data actually lives and why every abstraction above this layer has a cost.
2
MVCC: The Architecture That Makes Concurrency Possible (and Expensive)
How an 18-hour idle-in-transaction session caused 17 GB of dead tuple bloat and 6 hours of replica lag — and the MVCC mechanics behind it.
3
Write-Ahead Logging: Durability, Replication, and the Price of Every Write
Every byte you write passes through WAL. Understanding WAL is understanding your write amplification.
4
Autovacuum: The Process Everyone Misconfigures
How a stalled replication slot pinned OldestXmin for 48 hours, accumulating 800 million dead tuples — and the autovacuum mechanics that let it happen.
5
Indexes: B-Tree Internals, GIN, GiST, and When Each One Hurts You
An index is not a free performance boost. Every index has a write cost, a bloat trajectory, and a planner interaction that can go wrong.
6
Query Planning and Execution: How Postgres Decides What to Do With Your Query
The planner is a cost-based optimizer. Every wrong plan has a root cause in statistics, configuration, or schema design.
7
Schema Design at Scale: Decisions That Cannot Be Undone
Every schema decision on day one compounds at 100x data volume. This module covers the decisions that matter most.
8
PostgreSQL 18: Asynchronous I/O and What It Changes
The most architecturally significant change to Postgres in a decade — the definitive technical breakdown.
9
The RETURNING Clause Evolved: OLD/NEW Aliases and Eliminating Race Conditions
PostgreSQL 18 introduces OLD and NEW in RETURNING. This eliminates entire classes of application-level race conditions.
10
Replication, High Availability, and the CAP Trade-offs
Replication is not a backup. Understanding the consistency guarantees of each replication mode is non-negotiable.
11
SQL vs NoSQL: An Engineer's Framework, Not a Marketing Debate
A structured decision framework using storage engine mechanics, consistency models, and scaling limits.
12
Production Operations: Monitoring, Migration, and the Runbooks That Matter
The difference between junior and senior is not SQL knowledge — it is having the right runbooks before the incident.
13
Locking Internals: Row Locks, Table Locks, and Advisory Locks
Concurrency bugs are almost never about wrong SQL — they are about wrong locking assumptions. The complete PostgreSQL locking matrix.
14
Advanced Partitioning: Pruning, Maintenance, and pg_partman
The operational difference between designing a partition and maintaining 500 of them at scale.
15
The Extensions Ecosystem: pg_cron, TimescaleDB, Citus, and pgvector
How Postgres extensions replace entire categories of specialized databases — without leaving ACID behind.
16
Connection Pooling Failure Modes: PgBouncer, Serverless, and the Edge
The silent bugs that only appear in production: prepared statements, temp tables, and serverless connection spikes.
A-17
Logical Replication and CDC Pipelines
Logical vs physical replication, replication slots and the slot bloat disaster, publication/subscription model, WAL sender process, Debezium + Kafka change data capture, zero-downtime major version upgrades using logical replication as a migration bridge, and replication identity modes.