The B-tree index for practitioners, EXPLAIN basics, composite and partial indexes, and the write cost trade-off.
P-2 — Indexes: When and How to Add Them
Who this module is for: You have built schemas and written queries. Now a query that was fast with 100 rows is slow with 100,000. This module explains what indexes actually do (without diving into the B-tree internals — that is Phase 3), how to tell if you need one, how to create and use them correctly, and the cost that most tutorials never mention: every index makes writes slower.
What an Index Actually Does
When you run SELECT * FROM products WHERE price = 49.99, PostgreSQL has two options:
Sequential scan — read every single row from the table and check whether price = 49.99. If you have 1,000,000 rows, it reads all 1,000,000.
Index scan — jump directly to the rows where price = 49.99 using a pre-built lookup structure. If 3 rows match, it reads roughly 3 rows plus the index overhead.
An index is a separate data structure maintained by PostgreSQL that maps column values to the physical locations of rows. Think of it like a book's index: instead of reading every page to find "PostgreSQL", you look it up in the index and go directly to the pages listed.
The core tradeoff: an index makes SELECT faster but makes INSERT, UPDATE, and DELETE slower — because every write must update both the table and the index. Adding an index to every column is not a good strategy.
Reading EXPLAIN — Your Diagnostic Tool
Before adding an index, measure. EXPLAIN shows the query execution plan PostgreSQL would use.
sql-- Show the plan without executing the query EXPLAIN SELECT * FROM tasks WHERE status = 'todo'; -- Show the plan AND execute, with actual timing EXPLAIN ANALYZE SELECT * FROM tasks WHERE status = 'todo'; -- Full diagnostic output EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM tasks WHERE status = 'todo';
Reading the output:
Seq Scan on tasks (cost=0.00..35.50 rows=234 width=128)
(actual time=0.012..0.487 rows=234 loops=1)
Filter: (status = 'todo')
Rows Removed by Filter: 316
Planning Time: 0.142 ms
Execution Time: 0.532 ms
| Field | Meaning |
|---|---|
Seq Scan | Sequential scan — reading every row |
cost=0.00..35.50 | Estimated cost (startup..total, arbitrary units) |
rows=234 | Estimated row count |
actual time=0.012..0.487 | Real time in milliseconds (start..end) |
Rows Removed by Filter: 316 | How many rows were read but discarded |
Planning Time | Time to generate the plan |
Execution Time | Total actual execution time |
After adding an index:
Index Scan using idx_tasks_status on tasks
(cost=0.29..8.31 rows=234 width=128)
(actual time=0.018..0.089 rows=234 loops=1)
Index Cond: (status = 'todo')
Planning Time: 0.234 ms
Execution Time: 0.112 ms
The Seq Scan became an Index Scan. Execution time dropped from 0.532ms to 0.112ms — about 5× faster on a small table. On a million-row table, the difference would be far more dramatic.
Creating Indexes
Basic Index
sql-- Create an index on a single column CREATE INDEX idx_tasks_status ON tasks (status); -- Index name convention: idx_{table}_{column(s)} -- Descriptive names help you understand what each index is for
CREATE INDEX CONCURRENTLY — No Table Lock
Regular CREATE INDEX locks the table from writes for the duration of the build. On a large production table, this can take minutes and block your application.
sql-- Build the index without locking the table (safe for production) CREATE INDEX CONCURRENTLY idx_tasks_status ON tasks (status); -- Takes longer to build, but reads and writes continue normally -- Use this for any table with live traffic
Unique Index
sql-- Enforces uniqueness AND provides fast lookups CREATE UNIQUE INDEX idx_users_email ON users (email); -- Equivalent to: UNIQUE constraint (which creates an index automatically)
Dropping an Index
sqlDROP INDEX idx_tasks_status; DROP INDEX CONCURRENTLY idx_tasks_status; -- without locking writes
Composite Indexes — Column Order Matters
A composite index on (a, b) can be used for queries filtering on a alone, or a AND b together. It cannot efficiently answer queries on b alone.
sqlCREATE INDEX idx_tasks_project_status ON tasks (project_id, status); -- ✅ Uses the index (leading column: project_id) SELECT * FROM tasks WHERE project_id = 1; -- ✅ Uses the index (both columns) SELECT * FROM tasks WHERE project_id = 1 AND status = 'todo'; -- ❌ Does NOT use the index efficiently (missing leading column) SELECT * FROM tasks WHERE status = 'todo'; -- PostgreSQL will do a sequential scan or a separate index if one exists
Rule: put the equality filter column first, the range filter column second.
sql-- For a query like: WHERE status = 'todo' AND price > 50 -- Correct composite index: CREATE INDEX idx_products_status_price ON products (status, price); -- status = 'todo' narrows to a small set, then price > 50 filters within it
Partial Indexes — Index Only the Rows You Query
A partial index only includes rows matching a WHERE condition. It is smaller, faster to build, and uses less memory than a full index.
sql-- Only index open tasks (the ones you query most often) CREATE INDEX idx_tasks_open ON tasks (project_id, created_at) WHERE status IN ('todo', 'in_progress'); -- Completed and cancelled tasks are excluded — they are rarely queried -- Only index active users CREATE INDEX idx_users_active_email ON users (email) WHERE deleted_at IS NULL;
For a table where 95% of rows are in terminal states (done, archived), a partial index on the active 5% is dramatically smaller and faster.
Expression Indexes — Index a Computed Value
sql-- Case-insensitive email lookups CREATE INDEX idx_users_email_lower ON users (LOWER(email)); -- The query must use the same expression to use this index: SELECT * FROM users WHERE LOWER(email) = LOWER('Alice@Example.com'); -- ✅ Uses idx_users_email_lower SELECT * FROM users WHERE email = 'alice@example.com'; -- ❌ Does NOT use it (different expression)
Foreign Key Indexes — The Most Forgotten Optimization
PostgreSQL does not automatically create indexes on foreign key columns. This surprises many engineers. When you delete a row from the parent table, PostgreSQL must scan the child table to check for references — without an index, this is a full sequential scan.
sql-- Foreign keys in tasks ALTER TABLE tasks ADD CONSTRAINT tasks_project_fk FOREIGN KEY (project_id) REFERENCES projects(id); ALTER TABLE tasks ADD CONSTRAINT tasks_assigned_fk FOREIGN KEY (assigned_to) REFERENCES users(id); -- These indexes are NOT created automatically — add them manually: CREATE INDEX idx_tasks_project_id ON tasks (project_id); CREATE INDEX idx_tasks_assigned_to ON tasks (assigned_to); CREATE INDEX idx_comments_task_id ON comments (task_id); CREATE INDEX idx_comments_author ON comments (author_id);
Rule: every foreign key column should have an index. Without it, joins and deletions that reference that column are slow.
The Write Cost of Indexes
Every index you add slows down every INSERT, UPDATE, and DELETE on that table.
A table with no indexes:
INSERT: write 1 place (the table)
UPDATE: write 1 place
DELETE: write 1 place
The same table with 5 indexes:
INSERT: write 6 places (table + 5 indexes)
UPDATE: write up to 6 places (table + any index whose column changed)
DELETE: write 6 places
For a table that has 100,000 reads per second but only 10 writes per second, many indexes are fine. For a table that has 50,000 writes per second (like an event log or metrics table), every index is expensive.
This is why you should not add an index "just in case." Measure first, add indexes for proven slow queries.
Common Index Mistakes
Mistake 1: Indexing a low-cardinality column
sql-- A 'status' column with 4 possible values has low cardinality -- An index on it is often useless when many rows match CREATE INDEX idx_orders_status ON orders (status); -- If 'confirmed' represents 80% of rows, PostgreSQL ignores the index -- and does a sequential scan — it is cheaper to scan than to use the index -- when the result set is large -- FIX: use a partial index for the selective values only CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = 'pending'; -- Indexes only pending orders (maybe 2%) — highly selective
Mistake 2: Over-indexing a write-heavy table
sql-- A logging table receiving 10,000 inserts/second -- Every index you add costs you on every insert -- BAD: indexing everything CREATE INDEX idx_events_user_id ON events (user_id); CREATE INDEX idx_events_type ON events (event_type); CREATE INDEX idx_events_created_at ON events (created_at); CREATE INDEX idx_events_session_id ON events (session_id); -- 4 extra writes per INSERT, 10,000 inserts/sec = 40,000 extra write ops/sec
Mistake 3: Indexing a column used in a function
sql-- The index is NOT used because the function wraps the column CREATE INDEX idx_users_email ON users (email); SELECT * FROM users WHERE LOWER(email) = 'alice@example.com'; -- ❌ Doesn't use the index (LOWER() prevents it) -- FIX: create an expression index CREATE INDEX idx_users_email_lower ON users (LOWER(email)); SELECT * FROM users WHERE LOWER(email) = 'alice@example.com'; -- ✅ Now uses the expression index
Mistake 4: Not running ANALYZE after bulk loads
sql-- After inserting millions of rows, PostgreSQL's statistics are stale -- The planner may choose sequential scan even when an index exists INSERT INTO events SELECT ... FROM staging; -- insert 5 million rows -- Run ANALYZE to update statistics ANALYZE events; -- Now the planner knows the table is large and uses indexes appropriately
Monitoring Index Usage
sql-- Are your indexes actually being used? SELECT schemaname, tablename, indexname, idx_scan AS times_used, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes ORDER BY idx_scan ASC; -- Low idx_scan = index may not be needed -- Consider dropping unused indexes (they still slow down writes) -- Table statistics: sequential scans vs index scans SELECT relname AS table, seq_scan, idx_scan, n_live_tup AS row_count FROM pg_stat_user_tables ORDER BY seq_scan DESC; -- High seq_scan on large tables = potentially missing indexes
Practical Exercise: Indexing the Task Manager
Starting with the task manager schema from F-7 with 50,000 rows seeded:
sql-- Run these EXPLAIN queries and note the execution plans -- Q1: Are we doing sequential scans? EXPLAIN ANALYZE SELECT * FROM tasks WHERE project_id = 42 AND status = 'todo'; -- Likely: Seq Scan (no index yet) -- Add the right index CREATE INDEX CONCURRENTLY idx_tasks_project_status ON tasks (project_id, status); -- Run EXPLAIN again — should now show Index Scan EXPLAIN ANALYZE SELECT * FROM tasks WHERE project_id = 42 AND status = 'todo'; -- Q2: Add indexes for common query patterns -- Tasks assigned to a user, ordered by due date CREATE INDEX CONCURRENTLY idx_tasks_assigned_due ON tasks (assigned_to, due_date) WHERE status NOT IN ('done', 'cancelled'); -- Q3: Comments by task (very common query) CREATE INDEX CONCURRENTLY idx_comments_task_created ON comments (task_id, created_at); -- Q4: User lookup by email (login flow) CREATE INDEX CONCURRENTLY idx_users_email_lower ON users (LOWER(email)); -- Q5: Check that foreign keys are indexed CREATE INDEX CONCURRENTLY idx_tasks_project_id ON tasks (project_id); CREATE INDEX CONCURRENTLY idx_tasks_assigned_to ON tasks (assigned_to); CREATE INDEX CONCURRENTLY idx_comments_task_id ON comments (task_id); CREATE INDEX CONCURRENTLY idx_comments_author ON comments (author_id); CREATE INDEX CONCURRENTLY idx_project_members_user ON project_members (user_id); -- Q6: Verify your indexes \d tasks -- Should show all the indexes you created
Summary
| Concept | Key Takeaway |
|---|---|
| Sequential scan | Reads every row — fine for small tables or when most rows match |
| Index scan | Jumps directly to matching rows — fast for selective queries |
EXPLAIN ANALYZE | Always measure before and after adding an index |
CREATE INDEX CONCURRENTLY | Build index without blocking writes — use in production |
| Composite index | Column order matters: equality columns first, range columns last |
| Partial index | Only indexes rows matching a condition — smaller and faster |
| Expression index | Index a computed value (LOWER(email)) |
| Foreign key indexes | Not automatic — add manually for every FK column |
| Write cost | Every index slows down INSERT/UPDATE/DELETE |
| Low cardinality | Low-cardinality columns (status with 4 values) rarely benefit from full indexes |
| Index monitoring | pg_stat_user_indexes.idx_scan = 0 means the index is never used — drop it |
Module P-3 covers transactions and ACID in practice — what BEGIN, COMMIT, ROLLBACK actually do, isolation levels, and writing safe atomic operations for scenarios like bank transfers and inventory deduction.
Next: P-3 — Transactions and ACID in Practice →