Back/Module P-1 Advanced SQL — The Patterns You Will Use Every Week
Module P-1·20 min read

CTEs, window functions, upserts, subqueries, UNION — the SQL that separates proficient engineers from beginners.

P-1 — Advanced SQL: The Patterns You Will Use Every Week

Who this module is for: You completed Phase 1 and can write fundamental SQL. Phase 2 assumes you are building real applications and need the SQL patterns that separate engineers who can write queries from engineers who can write good queries. This module covers the constructs that appear in almost every production PostgreSQL codebase.


Common Table Expressions (CTEs) — WITH Clauses

A CTE creates a named, temporary result set within a query. It makes complex queries readable by breaking them into named steps.

sql
-- Without CTE: hard to read SELECT u.display_name, task_counts.total FROM users u JOIN ( SELECT assigned_to, COUNT(*) AS total FROM tasks WHERE status != 'done' GROUP BY assigned_to ) task_counts ON u.id = task_counts.assigned_to ORDER BY task_counts.total DESC; -- With CTE: same query, readable WITH open_task_counts AS ( SELECT assigned_to, COUNT(*) AS total FROM tasks WHERE status != 'done' GROUP BY assigned_to ) SELECT u.display_name, otc.total FROM users u JOIN open_task_counts otc ON u.id = otc.assigned_to ORDER BY otc.total DESC;

Chaining Multiple CTEs

sql
WITH -- Step 1: get all orders in the last 30 days recent_orders AS ( SELECT customer_id, product_id, quantity, unit_price FROM orders WHERE ordered_at > NOW() - INTERVAL '30 days' ), -- Step 2: calculate revenue per customer from those orders customer_revenue AS ( SELECT customer_id, SUM(quantity * unit_price) AS revenue FROM recent_orders GROUP BY customer_id ), -- Step 3: rank customers by revenue ranked_customers AS ( SELECT customer_id, revenue, RANK() OVER (ORDER BY revenue DESC) AS rank FROM customer_revenue ) SELECT c.name, rc.revenue, rc.rank FROM ranked_customers rc JOIN customers c ON rc.customer_id = c.id WHERE rc.rank <= 10; -- top 10 customers

CTEs execute once and are referenced by name — they are not re-executed for each reference. This is important for performance.

Recursive CTEs — Traversing Hierarchies

Recursive CTEs can traverse tree structures (org charts, categories with subcategories, file systems):

sql
-- Find all members of a team hierarchy, starting from a given manager WITH RECURSIVE team_hierarchy AS ( -- Base case: the starting employee SELECT id, name, manager_id, 0 AS depth FROM employees WHERE id = 3 -- start from VP Eng (id=3) UNION ALL -- Recursive case: find direct reports of everyone already in the result SELECT e.id, e.name, e.manager_id, th.depth + 1 FROM employees e JOIN team_hierarchy th ON e.manager_id = th.id ) SELECT REPEAT(' ', depth) || name AS indented_name, -- indent by depth depth FROM team_hierarchy ORDER BY depth, name; -- indented_name | depth -- -----------------+------- -- VP Eng | 0 -- Engineer 1 | 1 -- Engineer 2 | 1 -- Junior Dev | 2

Window Functions — Computation Without Collapsing Rows

Aggregate functions collapse many rows into one. Window functions compute across a set of rows but keep every row in the output.

sql
-- Aggregate: collapses rows — one row per category SELECT category, AVG(price) AS avg_price FROM products GROUP BY category; -- Window function: keeps every row, adds the avg alongside each product SELECT name, category, price, AVG(price) OVER (PARTITION BY category) AS category_avg, price - AVG(price) OVER (PARTITION BY category) AS diff_from_avg FROM products; -- name | category | price | category_avg | diff_from_avg -- --------------------+-----------+--------+--------------+--------------- -- Keyboard | peripherals| 129.99 | 89.99 | 40.00 -- Mouse | peripherals| 49.99 | 89.99 | -40.00 -- Webcam | peripherals| 89.99 | 89.99 | 0.00 -- Monitor Stand | accessories| 39.99 | 43.32 | -3.33 -- USB-C Hub | accessories| 64.99 | 43.32 | 21.67 -- Laptop Sleeve | accessories| 24.99 | 43.32 | -18.33

The OVER() Clause

OVER() defines the "window" — the set of rows each calculation sees:

sql
-- PARTITION BY: like GROUP BY but doesn't collapse rows OVER (PARTITION BY category) -- each row sees its category's rows -- ORDER BY: gives rows an order within the partition OVER (PARTITION BY category ORDER BY price DESC) -- No partition (all rows): each row sees the entire result set OVER ()

Essential Window Functions

Ranking:

sql
SELECT name, price, category, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS row_num, RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank, DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS dense_rank FROM products;
DifferenceWhen there is a tie
ROW_NUMBER()Assigns unique numbers (arbitrary tie-breaking)
RANK()Tied rows get same rank; next rank skips numbers (1,1,3)
DENSE_RANK()Tied rows get same rank; next rank does not skip (1,1,2)

Access previous/next rows:

sql
SELECT ordered_at::DATE AS date, SUM(quantity * unit_price) AS daily_revenue, LAG(SUM(quantity * unit_price), 1) OVER (ORDER BY ordered_at::DATE) AS prev_day, LEAD(SUM(quantity * unit_price), 1) OVER (ORDER BY ordered_at::DATE) AS next_day FROM orders GROUP BY ordered_at::DATE ORDER BY date; -- Shows each day's revenue alongside the previous and next day's revenue

Running totals:

sql
SELECT ordered_at::DATE AS date, SUM(quantity * unit_price) AS daily_revenue, SUM(SUM(quantity * unit_price)) OVER (ORDER BY ordered_at::DATE) AS cumulative_revenue FROM orders GROUP BY ordered_at::DATE ORDER BY date;

Percentile:

sql
SELECT name, price, NTILE(4) OVER (ORDER BY price) AS price_quartile -- 1=cheapest 25%, 4=most expensive 25% FROM products;

Subqueries

A subquery is a query nested inside another query.

Scalar subquery (returns one value)

sql
-- Find products priced above the overall average SELECT name, price FROM products WHERE price > (SELECT AVG(price) FROM products);

IN subquery (returns a list)

sql
-- Find customers who have placed at least one order SELECT name FROM customers WHERE id IN (SELECT DISTINCT customer_id FROM orders); -- Find customers who have NEVER ordered SELECT name FROM customers WHERE id NOT IN (SELECT DISTINCT customer_id FROM orders WHERE customer_id IS NOT NULL);

EXISTS — Efficient Existence Check

EXISTS stops as soon as it finds the first match — more efficient than IN for large tables:

sql
-- Find customers who have at least one high-value order SELECT c.name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.quantity * o.unit_price > 200 ); -- "SELECT 1" — EXISTS only cares whether a row exists, not what it contains

Correlated subquery (references outer query)

sql
-- Find the most expensive product in each category SELECT name, price, category FROM products p1 WHERE price = ( SELECT MAX(price) FROM products p2 WHERE p2.category = p1.category -- references the outer query's row );

INSERT ... ON CONFLICT — Upsert

Upsert inserts a row if it does not exist, updates it if it does. Essential for idempotent writes.

sql
-- Insert or update a product (based on unique SKU) INSERT INTO products (sku, name, price, in_stock) VALUES ('KB-001', 'Mechanical Keyboard', 129.99, true) ON CONFLICT (sku) DO UPDATE SET price = EXCLUDED.price, in_stock = EXCLUDED.in_stock, name = EXCLUDED.name; -- EXCLUDED.* refers to the values that were attempted to be inserted
sql
-- Insert or ignore (do nothing if duplicate) INSERT INTO users (email) VALUES ('alice@example.com') ON CONFLICT (email) DO NOTHING; -- No error, no update — silently skips the insert if email exists
sql
-- Upsert with conditional update (only update if the new value is higher) INSERT INTO product_stats (product_id, views) VALUES (1, 100) ON CONFLICT (product_id) DO UPDATE SET views = product_stats.views + EXCLUDED.views; -- Adds new views to existing views on conflict

RETURNING with DML — Get Data Back Without a Second Query

RETURNING retrieves data from modified rows in the same statement:

sql
-- Insert and get the generated ID back INSERT INTO tasks (project_id, created_by, title) VALUES (1, 1, 'New task') RETURNING id, created_at; -- Update and see what changed UPDATE products SET price = price * 0.9 WHERE category = 'accessories' RETURNING id, name, price AS new_price; -- Delete and retrieve what was removed DELETE FROM tasks WHERE status = 'cancelled' AND created_at < NOW() - INTERVAL '30 days' RETURNING id, title, project_id;

UNION, INTERSECT, EXCEPT

Combine result sets from multiple SELECT statements:

sql
-- UNION: all rows from both queries (removes duplicates) SELECT email FROM customers UNION SELECT email FROM newsletter_subscribers; -- UNION ALL: all rows including duplicates (faster — no deduplication) SELECT 'New' AS type, id, title FROM tasks WHERE status = 'todo' UNION ALL SELECT 'Active' AS type, id, title FROM tasks WHERE status = 'in_progress'; -- INTERSECT: only rows that appear in both SELECT email FROM customers INTERSECT SELECT email FROM newsletter_subscribers; -- Customers who are also subscribers -- EXCEPT: rows in first but not second SELECT email FROM customers EXCEPT SELECT email FROM newsletter_subscribers; -- Customers who are NOT subscribers (compare with LEFT JOIN ... WHERE IS NULL)

CASE Expressions — Inline Conditionals

sql
-- Categorise prices SELECT name, price, CASE WHEN price < 30 THEN 'budget' WHEN price < 100 THEN 'mid-range' ELSE 'premium' END AS price_tier FROM products; -- Count by status using conditional aggregation SELECT COUNT(*) FILTER (WHERE status = 'todo') AS todo_count, COUNT(*) FILTER (WHERE status = 'in_progress') AS active_count, COUNT(*) FILTER (WHERE status = 'done') AS done_count FROM tasks WHERE project_id = 1; -- Equivalent with CASE: SELECT SUM(CASE WHEN status = 'todo' THEN 1 ELSE 0 END) AS todo_count, SUM(CASE WHEN status = 'in_progress' THEN 1 ELSE 0 END) AS active_count, SUM(CASE WHEN status = 'done' THEN 1 ELSE 0 END) AS done_count FROM tasks WHERE project_id = 1;

Practical Exercise: Task Dashboard Queries

Using the task manager schema from F-7:

sql
-- Q1: For each project, show total tasks, done tasks, and % completion WITH task_summary AS ( SELECT project_id, COUNT(*) AS total, COUNT(*) FILTER (WHERE status = 'done') AS done, COUNT(*) FILTER (WHERE status = 'in_progress') AS in_progress FROM tasks GROUP BY project_id ) SELECT p.name AS project, ts.total, ts.done, ts.in_progress, ROUND(100.0 * ts.done / NULLIF(ts.total, 0), 1) AS pct_complete FROM projects p JOIN task_summary ts ON p.id = ts.project_id; -- Q2: Rank users by number of tasks they have completed SELECT u.display_name, COUNT(*) AS completed_tasks, RANK() OVER (ORDER BY COUNT(*) DESC) AS rank FROM tasks t JOIN users u ON t.assigned_to = u.id WHERE t.status = 'done' GROUP BY u.id, u.display_name; -- Q3: For each task, show how long it has been open (or was open) SELECT title, status, created_at, COALESCE(completed_at, NOW()) AS closed_or_now, AGE(COALESCE(completed_at, NOW()), created_at) AS age FROM tasks ORDER BY created_at; -- Q4: Upsert a task status (idempotent status update) INSERT INTO tasks (id, project_id, created_by, title, status) VALUES (1, 1, 1, 'Existing Task', 'in_progress') ON CONFLICT (id) DO UPDATE SET status = EXCLUDED.status, updated_at = NOW(); -- Q5: Find projects with no tasks SELECT p.name, p.created_at FROM projects p WHERE NOT EXISTS ( SELECT 1 FROM tasks t WHERE t.project_id = p.id );

Summary

FeatureWhen to Use
CTE (WITH)Break complex queries into readable named steps
Recursive CTETraverse hierarchical data (org charts, categories)
ROW_NUMBER()Unique sequential position
RANK() / DENSE_RANK()Position with tied rows
LAG() / LEAD()Access previous/next row values
Running SUM() OVERCumulative totals
Scalar subquerySingle-value comparison (price > (SELECT AVG...))
EXISTSEfficient check for row existence
ON CONFLICT DO UPDATEUpsert — insert or update atomically
ON CONFLICT DO NOTHINGInsert if not exists, ignore if duplicate
RETURNINGGet modified row data back without a second query
FILTER (WHERE ...)Conditional aggregation without subqueries
CASE WHENInline conditional logic in SELECT

Module P-2 covers indexing from a practitioner's perspective — not the internals (that is Phase 3), but when to add an index, when not to, and how to read EXPLAIN output to measure the difference.

Next: P-2 — Indexes — When and How to Add Them →

Discussion