Module P-8·27 min read

EXPLAIN ANALYZE for practitioners, key config parameters, N+1 queries, PgBouncer basics, and slow query logging.

P-8 — Performance Tuning for Application Engineers

There is a version of performance tuning that belongs to DBAs: they adjust shared_buffers, tune checkpoint_completion_target, read kernel parameters, and operate at the infrastructure level. That work matters at scale.

There is another version that belongs to application engineers — and it produces far larger wins in far less time. This module covers the second kind: reading EXPLAIN ANALYZE output confidently, eliminating N+1 queries, understanding the configuration knobs that actually matter for applications, and using PgBouncer correctly. These are the skills that turn a slow application into a fast one without needing infrastructure access.

Reading EXPLAIN ANALYZE

EXPLAIN shows the query plan PostgreSQL chose. EXPLAIN ANALYZE executes the query and shows both the plan and the actual runtime measurements. Always use ANALYZE when debugging real performance — the plan alone is often misleading.

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 20;

Anatomy of an EXPLAIN Output

Limit  (cost=1234.56..1234.61 rows=20 width=48) (actual time=45.123..45.130 rows=20 loops=1)
  ->  Sort  (cost=1234.56..1259.56 rows=10000 width=48) (actual time=45.120..45.125 rows=20 loops=1)
        Sort Key: o.created_at DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Hash Join  (cost=450.00..900.00 rows=10000 width=48) (actual time=12.345..40.234 rows=9876 loops=1)
              Hash Cond: (o.user_id = u.id)
              ->  Seq Scan on orders o  (cost=0.00..350.00 rows=10000 width=32) (actual time=0.012..15.234 rows=9876 loops=1)
                    Filter: ((status)::text = 'pending'::text)
                    Rows Removed by Filter: 90124
              ->  Hash  (cost=200.00..200.00 rows=20000 width=24) (actual time=8.456..8.456 rows=20000 loops=1)
                    Buckets: 32768  Batches: 1  Memory Usage: 1234kB
                    ->  Seq Scan on users u  (cost=0.00..200.00 rows=20000 width=24) (actual time=0.008..5.678 rows=20000 loops=1)
Planning Time: 0.456 ms
Execution Time: 45.234 ms

Key things to read:

cost=X..Y — estimated cost units. The first number is startup cost (cost before first row). The second is total cost. These are planner estimates, not milliseconds.

actual time=X..Y — real milliseconds. First number: time to first row. Second: total time. This is what you actually care about.

rows=N (estimated) vs rows=N (actual) — if these diverge significantly (10x or more), the planner has bad statistics. Run ANALYZE table_name to refresh them.

loops=N — how many times this node executed. If a nested loop runs 10,000 times, multiply the actual time by loops to get true total time.

Rows Removed by Filter — rows that were scanned and discarded. High numbers here mean a sequential scan is doing unnecessary work that an index could avoid.

The Most Important Nodes to Recognise

Seq Scan — full table scan. Not always bad (small tables, high selectivity filters), but suspicious on large tables with a filter.

Index Scan — uses a B-tree index to find rows. Good.

Index Only Scan — all needed data is in the index itself, no heap fetch. Best case.

Bitmap Index Scan + Bitmap Heap Scan — fetches a set of matching row locations via index, then reads those pages. Good for queries returning many rows from an index.

Nested Loop — for each row in the outer relation, look up matching rows in the inner relation. Fine when the outer set is small. Catastrophic when the outer set is large (N+1 pattern).

Hash Join — build a hash table from the smaller relation, probe it with the larger. Good for large joins on equality conditions.

Merge Join — sort both sides and merge. Good when both inputs are already sorted.

The BUFFERS Option

sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 42;

Adding BUFFERS shows cache hits and misses:

Buffers: shared hit=12 read=3

hit — pages served from PostgreSQL's shared buffer cache (fast, no disk I/O)
read — pages read from disk (slow, or from OS page cache)

A query with many read pages on a hot table suggests shared_buffers is too small or the working set doesn't fit in memory.

Identifying Missing Indexes

The most common fix for slow queries is a missing index. Signs to look for in EXPLAIN ANALYZE:

Seq Scan on a large table with Rows Removed by Filter >> actual rows returned
actual time on a Seq Scan node is large
The filter column isn't indexed

sql
-- This query is slow: status is unindexed, table has 500k rows
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';

-- Seq Scan on orders (cost=0.00..12500.00 rows=1000 width=128)
--                    (actual time=0.012..234.567 rows=987 loops=1)
--   Filter: (status = 'pending')
--   Rows Removed by Filter: 499013

-- Fix: add an index
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);

-- Now: Index Scan using idx_orders_status on orders
--      (actual time=0.034..1.234 rows=987 loops=1)

CREATE INDEX CONCURRENTLY builds the index without locking the table — always use it in production.

The N+1 Query Problem

The N+1 problem is the most common performance killer in production applications. It happens when you execute one query to fetch N parent records, then execute one query per record to fetch related data — producing N+1 total queries.

javascript
// Bad: N+1 in application code
const orders = await db.query('SELECT id, user_id FROM orders LIMIT 100');
// 100 queries follow:
for (const order of orders) {
  const user = await db.query('SELECT email FROM users WHERE id = $1', [order.user_id]);
  order.user = user;
}

At 100 orders this is noticeable. At 1000 orders this is a page timeout.

Fix 1: JOIN in SQL

sql
SELECT o.id, o.total, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
LIMIT 100;

One query. The database does the join efficiently using indexes.

Fix 2: IN clause batch fetch

javascript
// Better: two queries total
const orders = await db.query('SELECT id, user_id FROM orders LIMIT 100');
const userIds = orders.map(o => o.user_id);
const users = await db.query('SELECT id, email FROM users WHERE id = ANY($1)', [userIds]);
// Join in application memory

Fix 3: ORM eager loading

In ORMs, N+1 manifests through lazy loading. Most ORMs have an "eager load" or "include" option:

javascript
// Prisma
const orders = await prisma.order.findMany({
  take: 100,
  include: { user: true }  // single JOIN query, not N queries
});

How to Detect N+1 in Production

Enable log_min_duration_statement to catch slow queries. But for N+1, individual queries may each be fast — the problem is volume. Use a query counter middleware or APM tool (Datadog, New Relic) that shows query count per request. Any request making 50+ identical queries is almost certainly N+1.

sql
-- In development: log ALL queries to spot N+1
ALTER SYSTEM SET log_min_duration_statement = 0;
SELECT pg_reload_conf();

-- In production: log queries slower than 100ms
ALTER SYSTEM SET log_min_duration_statement = 100;
SELECT pg_reload_conf();

Key Configuration Parameters for Application Engineers

Most PostgreSQL config lives in postgresql.conf. These are the parameters that meaningfully affect application performance and are safe to tune without DBA-level expertise.

`work_mem`

Memory per sort/hash operation per query. Default is 4MB — often too low.

sql
-- Current value
SHOW work_mem;

-- Set for a specific session (tuning without global impact)
SET work_mem = '64MB';

-- Check if a sort spilled to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... ORDER BY ...;
-- Look for: Sort Method: external merge  Disk: 4096kB
-- vs:       Sort Method: quicksort       Memory: 128kB

If you see external merge Disk in sort nodes, work_mem is too low for that query. Increase it globally carefully — work_mem is allocated per sort per connection, so 64MB * 200 connections * 3 sorts = 38GB. Set it conservatively globally, and use SET LOCAL work_mem for specific heavy queries.

`shared_buffers`

The size of PostgreSQL's buffer cache — how much data it holds in memory. Default is 128MB, absurdly low for any production server. Rule of thumb: 25% of total RAM.

sql
SHOW shared_buffers;
-- Requires restart to change:
-- shared_buffers = '4GB'  in postgresql.conf

`effective_cache_size`

Not actual memory allocation — this tells the planner how much memory is available for caching (PostgreSQL buffers + OS page cache combined). It affects index vs. sequential scan decisions. Set it to 50-75% of total RAM.

sql
-- This is advisory only, doesn't allocate memory
SET effective_cache_size = '12GB';

`random_page_cost`

The planner's estimate of the cost of a random disk read, relative to sequential reads. Default is 4.0 (spinning disk). On SSDs or cloud storage, set it to 1.1-1.5 — this makes the planner more willing to use index scans.

sql
-- For SSD-backed storage
ALTER SYSTEM SET random_page_cost = 1.1;
SELECT pg_reload_conf();

This single change often fixes "planner prefers Seq Scan over Index Scan" complaints on cloud databases.

`max_connections`

The maximum number of concurrent client connections. Default is 100. Each connection consumes memory (~5-10MB for the backend process). The answer to "we need more connections" is almost never to increase max_connections — it's to add a connection pooler.

Connection Pooling with PgBouncer

Each PostgreSQL connection is a dedicated backend process. At 500 concurrent connections, PostgreSQL is spending significant memory and CPU just on connection overhead. Most connections spend most of their time idle, waiting for application code to execute.

PgBouncer is a lightweight connection pool that sits between your application and PostgreSQL. Applications connect to PgBouncer; PgBouncer maintains a smaller pool of actual PostgreSQL connections.

PgBouncer Pool Modes

Transaction pooling (most common for web apps): A PostgreSQL connection is assigned to a client only for the duration of a transaction. After COMMIT or ROLLBACK, the connection returns to the pool. 1000 application connections can share 20 PostgreSQL connections if transactions are short.

Session pooling: A PostgreSQL connection is assigned to a client for the entire session duration. Better compatibility (session-level state is preserved) but lower multiplexing.

Statement pooling: Connection released after each statement. Highest multiplexing, but incompatible with multi-statement transactions.

ini
# pgbouncer.ini (transaction pooling)
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
listen_port = 6432

Transaction Pooling Incompatibilities

Transaction pooling is the right default for web applications, but a few PostgreSQL features don't work with it:

SET session variables — use SET LOCAL inside a transaction, or set_config(..., true) for transaction-local config
Advisory locks (session-scoped) — use transaction-scoped advisory locks instead
Prepared statements — disable server-side prepared statements in your driver, or use PgBouncer's server_reset_query option
LISTEN/NOTIFY — requires a dedicated non-pooled connection

For most CRUD web applications, none of these are issues. If you use RLS with session variables (as in P-6), use set_config('app.tenant_id', $1, true) — the true flag makes it transaction-local, which is safe with transaction pooling.

Monitoring PgBouncer

sql
-- Connect to PgBouncer's admin database
psql -p 6432 pgbouncer

-- Pool status
SHOW POOLS;
-- Shows: database, user, cl_active, cl_waiting, sv_active, sv_idle, sv_used

-- Client list
SHOW CLIENTS;

-- Server connection list
SHOW SERVERS;

cl_waiting > 0 means clients are queued waiting for a pool connection — your default_pool_size may be too small or queries are taking too long.

Slow Query Logging

The simplest performance monitoring tool: log queries that take longer than a threshold.

sql
-- Log queries slower than 500ms
ALTER SYSTEM SET log_min_duration_statement = 500;
SELECT pg_reload_conf();

-- Also useful: log lock waits
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET deadlock_timeout = '1s';
SELECT pg_reload_conf();

Logs appear in PostgreSQL's log file (location shown by SHOW log_directory). In production, ship these to a log aggregator and set up alerts on slow query frequency.

`pg_stat_statements`

For aggregate slow query analysis, pg_stat_statements is the essential extension. It tracks cumulative statistics for every distinct query shape.

sql
-- Enable (requires restart)
-- In postgresql.conf: shared_preload_libraries = 'pg_stat_statements'

-- After restart:
CREATE EXTENSION pg_stat_statements;

-- Top 10 queries by total time
SELECT
  round(total_exec_time::numeric, 2) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2)  AS mean_ms,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  left(query, 80) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Top 10 by average time (worst per-call performance)
SELECT
  round(mean_exec_time::numeric, 2) AS mean_ms,
  calls,
  left(query, 80) AS query
FROM pg_stat_statements
WHERE calls > 100   -- ignore rarely-called queries
ORDER BY mean_exec_time DESC
LIMIT 10;

This is your primary tool for finding what to optimize in production — it shows real cumulative cost, not just the worst individual queries.

Common Performance Patterns

Avoid `SELECT *`

sql
-- Bad: fetches all columns, including large TEXT/JSONB columns
SELECT * FROM articles WHERE id = 42;

-- Good: fetch only what you need
SELECT id, title, created_at FROM articles WHERE id = 42;

SELECT * prevents index-only scans and increases network transfer between PostgreSQL and your application.

Use `LIMIT` with `ORDER BY`

Without ORDER BY, LIMIT returns arbitrary rows — and PostgreSQL may still scan the whole table. With ORDER BY on an indexed column, PostgreSQL can use the index to fetch exactly the rows needed.

sql
-- Bad: full scan, then limit
SELECT * FROM events LIMIT 20;

-- Good: index scan on created_at, stops after 20 rows
SELECT * FROM events ORDER BY created_at DESC LIMIT 20;

Pagination: Keyset vs. OFFSET

OFFSET pagination gets slower with each page:

sql
-- Page 1000 at 20 items/page: scans and discards 19,980 rows
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 19980;

Keyset (cursor) pagination stays fast regardless of depth:

sql
-- First page
SELECT id, title, created_at FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Next page: pass the last row's values as the cursor
SELECT id, title, created_at FROM posts
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;

Keyset pagination requires a composite index on (created_at DESC, id DESC) and works best for "load more" / infinite scroll UIs. For page-number navigation, OFFSET is sometimes necessary — just be aware of the performance cliff at high page numbers.

Partial Indexes for Hot Subsets

If your application frequently queries a small subset of a table (pending orders, unread notifications, active users), a partial index on that subset is much smaller and faster than a full index:

sql
-- Instead of indexing all orders by status:
CREATE INDEX idx_orders_status ON orders (status);

-- Index only the hot subset (pending orders are << 1% of all orders):
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';

-- This query uses the partial index:
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;

The partial index is tiny compared to a full index on status, fits in memory more easily, and updates only when status = 'pending' rows change.

Practical Checklist

When a query is slow, work through this list:

Run EXPLAIN (ANALYZE, BUFFERS) — understand the actual plan, not the estimated one
Look for Seq Scans on large tables with high Rows Removed by Filter — candidate for an index
Check estimated vs. actual row counts — large divergence means stale statistics; run ANALYZE
Check for loops=N on Nested Loop nodes — if N is large and the inner query touches a big table, you may have an N+1 or a missing index on the join column
Check Sort Method: external merge — increase work_mem for this query
Check Buffers: read=N on hot queries — large read counts suggest the working set isn't in shared_buffers
Check pg_stat_statements for cumulative slow queries — fix highest total_exec_time first
Count queries per request in your application — N+1 shows up as many fast identical queries, not one slow one

Summary

Application-level performance tuning follows a predictable pattern: measure with EXPLAIN ANALYZE and pg_stat_statements, fix the biggest issues first (missing indexes, N+1 queries), then tune configuration to match your hardware.

The parameters that matter most for application engineers are work_mem (sorts), random_page_cost (SSD vs. spinning disk planning), and log_min_duration_statement (finding slow queries). PgBouncer in transaction mode is the right connection pooling setup for most web applications.

Next up: P-9 — External Services, Caching Layers, and Deployment — ORMs vs. raw SQL, read replicas, materialised views, managed databases, and a production-ready deployment checklist.

PgBouncer in Depth — The Bugs That Only Appear in Production

The mention of PgBouncer earlier in this module covers the surface: install it, point your app at it, fewer connections. What it doesn't cover: the silent bugs that appear when you use PgBouncer in transaction mode with an ORM that uses prepared statements. These bugs don't appear in development. They don't appear in staging. They appear in production at 2am, intermittently, with no obvious cause.

Transaction Mode vs Session Mode

PgBouncer has three pooling modes:

Session mode: Each client connection gets a dedicated server connection for its entire duration. Zero compatibility issues — the client gets a real PostgreSQL session with all session-level state preserved. Essentially useless for connection reduction in serverless or short-lived connection environments — if every connection lasts 5ms, there's no pooling benefit.

Transaction mode: A server connection is allocated only for the duration of a transaction. Between transactions, the connection returns to the pool. 1000 application connections sharing 20 PostgreSQL server connections. This is where the performance gains are. This is also where the bugs are.

Statement mode: A server connection is allocated per SQL statement. Breaks multi-statement transactions entirely — unusable for most applications. Only relevant for specific read-only analytics workloads.

For most web applications, transaction mode is the right choice. The compatibility constraints are manageable once you know them.

The Prepared Statement Problem in Transaction Mode

Prepared statements in PostgreSQL are server-side objects: PREPARE name AS SELECT ... creates a named query plan on a specific backend connection. The plan lives on that connection. It does not exist on other connections.

When PgBouncer reassigns connections between transactions (which is the entire point of transaction mode), the next client gets a different backend. The prepared statement from the previous connection does not exist on this new backend.

Prisma uses prepared statements by default. Connecting Prisma to PgBouncer in transaction mode produces:

Error: prepared statement "s1" does not exist

This error is intermittent. It only fires when PgBouncer reassigns the connection between Prisma's prepare call and Prisma's execute call. In development with a single developer, PgBouncer reuse is low and you never hit it. In production with 50 concurrent users, PgBouncer is aggressively reusing connections and you hit it constantly.

Fix for Prisma: Add pgbouncer=true to your connection string:

DATABASE_URL="postgresql://user:password@pgbouncer-host:6432/mydb?pgbouncer=true&connection_limit=1"

pgbouncer=true tells Prisma to use the simple query protocol instead of the extended (prepared statement) protocol. Every query goes as a plain text query. You lose server-side query plan caching (Prisma caches plans on its side instead), but the compatibility issues disappear.

The connection_limit=1 is for serverless environments (Vercel, Lambda) where each function instance should hold at most one PgBouncer connection — otherwise each function instance opens multiple connections and you defeat the purpose of pooling.

node-postgres (pg): Prepared statements only fire when you explicitly use named queries:

javascript
// Uses prepared statements (breaks with transaction pooling)
await client.query({ name: 'get-user', text: 'SELECT * FROM users WHERE id = $1' }, [id]);

// Plain text query — safe with transaction pooling
await client.query('SELECT * FROM users WHERE id = $1', [id]);

If you are not using named queries in node-postgres, you are already safe.

Drizzle ORM: Uses $client.query() which maps to the simple query protocol. Safe with PgBouncer transaction mode without any configuration changes.

SQLAlchemy (Python): Set pool_pre_ping=True and use DISCARD ALL as the reset query. Or use NullPool for serverless functions.

Session-Level State and Transaction Mode

Any session-level state set outside a transaction is lost when PgBouncer reassigns the connection. This includes:

sql
-- BREAKS in transaction mode: SET without a transaction
SET search_path = myschema, public;
SET app.user_id = '42';

-- SAFE: SET LOCAL (scoped to the current transaction)
BEGIN;
SET LOCAL search_path = myschema, public;
SET LOCAL app.user_id = '42';
COMMIT;

-- SAFE: set_config with transaction-local flag
SELECT set_config('app.user_id', '42', true);  -- true = transaction-local

For Row-Level Security (as covered in P-6), always use the transaction-local form:

sql
SELECT set_config('app.current_tenant_id', $1::text, true);

Advisory locks (session-scoped) also break with transaction pooling. Use transaction-scoped advisory locks instead:

sql
-- Session-scoped (breaks with transaction pooling)
SELECT pg_advisory_lock(12345);
-- ... do work ...
SELECT pg_advisory_unlock(12345);

-- Transaction-scoped (safe with transaction pooling)
BEGIN;
SELECT pg_advisory_xact_lock(12345);
-- ... do work ...
COMMIT;  -- lock released automatically at transaction end

The `application_name` Trick for Pool Monitoring

Set application_name in your connection string to identify which service is using which pool slot. In a microservices environment where six services all connect to the same PgBouncer, this is the only way to see which service is causing cl_waiting to spike:

# Node.js / Prisma
DATABASE_URL="postgresql://user:password@pgbouncer:6432/mydb?application_name=api-server"

# Python / SQLAlchemy
engine = create_engine(url, connect_args={"application_name": "worker-service"})

In PgBouncer's admin console:

sql
PGPASSWORD=pgbouncer psql -h pgbouncer-host -p 6432 -U pgbouncer pgbouncer

SHOW POOLS;
-- database | user | cl_active | cl_waiting | sv_active | sv_idle | sv_used | maxwait
-- mydb     | user | 45        | 3          | 48        | 2       | 0       | 0

SHOW CLIENTS;
-- addr     | port  | local_addr | state  | addr  | connect_time | application_name
-- 10.0.0.5 | 54321 | 10.0.0.1  | active | ...   | ...          | api-server

cl_waiting > 0 is the alert signal: clients are waiting for a pool slot. Either pool_size is too small for current load, or queries are running slower than expected (holding connections longer). Check pg_stat_statements for mean query duration to distinguish the two causes.

maxwait shows how long the longest-waiting client has been waiting. A maxwait above 500ms means real latency impact on your application.

Sizing the Pool

The formula for server-side connection count:

server_connections = pool_size × number_of_pgbouncer_instances
server_connections < max_connections - superuser_reserved_connections

Where superuser_reserved_connections defaults to 3 (PostgreSQL reserves these for emergency admin access — do not count them in your application budget).

Example: PostgreSQL with max_connections = 100, one PgBouncer instance with default_pool_size = 20:

20 server connections available to applications
77 remaining connections available (100 - 20 - 3 superuser reserved)
Application can have effectively unlimited client connections, multiplexed into 20 server connections

Right-sizing pool_size using Little's Law:

pool_size = concurrency × avg_query_duration_seconds × headroom_multiplier

If you have 100 concurrent database-bound requests averaging 10ms each:

pool_size = 100 × 0.01 × 3 = 3 (with 3x headroom for variance)

This seems counterintuitively small. It's correct. Most web application queries finish in 1–20ms. Even at 1,000 req/sec with 10ms average query time, you only have 10 queries in-flight simultaneously. You need 10–30 pool slots, not 500.

Larger pool_size is not better. More server connections = more PostgreSQL memory (8–10MB per connection, including backend process overhead). For 100 connections: ~1GB just for connection overhead before any query data is loaded. For 500 connections: ~5GB. Underprovisioning pool_size causes cl_waiting spikes. Overprovisioning pool_size wastes memory and degrades PostgreSQL performance through connection overhead and lock contention.

Monitor cl_waiting in production, then adjust pool_size in small increments (5 at a time) until waits are consistently zero under normal load. Set an alert for when maxwait > 100ms — that's your early warning that the pool needs expansion.

PreviousModule P-7: Full-Text Search Next Module P-9: External Services, Caching Layers, and Deployment