🐘

Documentation Course

PostgreSQL In-Depth

From First Query to Production Architecture

A complete A-Z PostgreSQL curriculum — from absolute basics for beginners through production patterns for mid-level engineers to the deep internals that only senior engineers ever learn. Built from years of running PostgreSQL at 20–30 TB scale on a live blockchain indexer.

35 modulesBeginner to AdvancedWritten / Documentation-style35 of 35 modules live

How to use this course

This course works as both a sequential read and a standalone reference. Read front-to-back to build a complete mental model of Postgres under production load. Or jump to any module when you hit a specific incident — VACUUM bloat, a lock queue outage, a query regression after a data load.

Total reading time

~15 hrs

across 35 modules

Built from

20–30 TB

live blockchain indexer

Prerequisite

Zero SQL to production internals

Beginner through senior engineer

Phase 1 — Foundation

Absolute basics · No prior knowledge assumed

7 modules

What Is PostgreSQL and Why Does It Exist?

The problem databases solve, the client-server model, installation, and your first database — zero prior knowledge assumed.

Tables, Rows, and the Relational Mental Model

Data types, CREATE TABLE, INSERT, SELECT, UPDATE, DELETE — every fundamental operation from first principles.

Filtering, Sorting, and Finding What You Need

WHERE, ORDER BY, LIMIT, OFFSET, string functions, and date arithmetic — the tools you will use on every single query.

Aggregation — Summarising Your Data

COUNT, SUM, AVG, GROUP BY, HAVING — the mental model that confuses beginners, explained from scratch.

Connecting Tables — Joins Demystified

INNER JOIN, LEFT JOIN, FULL OUTER JOIN, self-joins, and multi-table queries — the most feared concept made simple.

Constraints and Data Integrity

PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK — letting the database enforce your rules so application code does not have to.

Your First Real Application Schema

Translating a real-world problem into tables, connecting from application code, and basic backup and restore.

Phase 2 — Practitioner

Production patterns · Real application architecture

10 modules

Advanced SQL — The Patterns You Will Use Every Week

CTEs, window functions, upserts, subqueries, UNION — the SQL that separates proficient engineers from beginners.

Indexes — When and How to Add Them

The B-tree index for practitioners, EXPLAIN basics, composite and partial indexes, and the write cost trade-off.

Transactions and ACID in Practice

BEGIN, COMMIT, ROLLBACK, isolation levels, and safe atomic operations — what every production application must understand.

Schema Design for Real Applications

Normalisation, denormalisation trade-offs, correct data types for money and time, soft deletes, and schema migration tools.

JSON and JSONB — Working With Semi-Structured Data

JSONB vs JSON, containment operators, GIN indexes, and when JSONB is the right tool vs. a schema design shortcut.

Authentication, Row-Level Security, and Access Control

Roles, privileges, Row-Level Security policies, and multi-tenant data isolation — the production access model.

Full-Text Search

tsvector, tsquery, GIN indexes, relevance ranking, and generated tsvector columns — search without Elasticsearch.

Performance Tuning for Application Engineers

EXPLAIN ANALYZE for practitioners, key config parameters, N+1 queries, PgBouncer basics, and slow query logging.

External Services, Caching Layers, and Deployment

ORMs vs. raw SQL, read replicas, materialised views, managed databases, and a production-ready deployment checklist.

Zero-Downtime Schema Migrations

The ALTER TABLE lock matrix, CREATE INDEX CONCURRENTLY, NOT VALID + VALIDATE CONSTRAINT two-phase pattern, adding NOT NULL columns safely on 500M-row tables, column rename via view aliasing, lock_timeout + retry in migration tools, and the exact migration sequences that have taken production sites down.

Phase 3 — Architect

Engine internals · High-throughput systems · Distributed architecture

18 modules

Before You Proceed: Mental Model Reset

Why most engineers have a shallow model of Postgres — and what it costs them in production.

The Storage Engine: Pages, Heaps, and the True Cost of a Row

Where your data actually lives and why every abstraction above this layer has a cost.

MVCC: The Architecture That Makes Concurrency Possible (and Expensive)

How an 18-hour idle-in-transaction session caused 17 GB of dead tuple bloat and 6 hours of replica lag — and the MVCC mechanics behind it.

Write-Ahead Logging: Durability, Replication, and the Price of Every Write

Every byte you write passes through WAL. Understanding WAL is understanding your write amplification.

Autovacuum: The Process Everyone Misconfigures

How a stalled replication slot pinned OldestXmin for 48 hours, accumulating 800 million dead tuples — and the autovacuum mechanics that let it happen.

Indexes: B-Tree Internals, GIN, GiST, and When Each One Hurts You

An index is not a free performance boost. Every index has a write cost, a bloat trajectory, and a planner interaction that can go wrong.

Query Planning and Execution: How Postgres Decides What to Do With Your Query

The planner is a cost-based optimizer. Every wrong plan has a root cause in statistics, configuration, or schema design.

Schema Design at Scale: Decisions That Cannot Be Undone

Every schema decision on day one compounds at 100x data volume. This module covers the decisions that matter most.

PostgreSQL 18: Asynchronous I/O and What It Changes

The most architecturally significant change to Postgres in a decade — the definitive technical breakdown.

The RETURNING Clause Evolved: OLD/NEW Aliases and Eliminating Race Conditions

PostgreSQL 18 introduces OLD and NEW in RETURNING. This eliminates entire classes of application-level race conditions.

Replication, High Availability, and the CAP Trade-offs

Replication is not a backup. Understanding the consistency guarantees of each replication mode is non-negotiable.

SQL vs NoSQL: An Engineer's Framework, Not a Marketing Debate

A structured decision framework using storage engine mechanics, consistency models, and scaling limits.

Production Operations: Monitoring, Migration, and the Runbooks That Matter

The difference between junior and senior is not SQL knowledge — it is having the right runbooks before the incident.

Locking Internals: Row Locks, Table Locks, and Advisory Locks

Concurrency bugs are almost never about wrong SQL — they are about wrong locking assumptions. The complete PostgreSQL locking matrix.

Advanced Partitioning: Pruning, Maintenance, and pg_partman

The operational difference between designing a partition and maintaining 500 of them at scale.

The Extensions Ecosystem: pg_cron, TimescaleDB, Citus, and pgvector

How Postgres extensions replace entire categories of specialized databases — without leaving ACID behind.

Connection Pooling Failure Modes: PgBouncer, Serverless, and the Edge

The silent bugs that only appear in production: prepared statements, temp tables, and serverless connection spikes.

Logical Replication and CDC Pipelines

Logical vs physical replication, replication slots and the slot bloat disaster, publication/subscription model, WAL sender process, Debezium + Kafka change data capture, zero-downtime major version upgrades using logical replication as a migration bridge, and replication identity modes.