Module P-1·20 min read

CTEs, window functions, upserts, subqueries, UNION — the SQL that separates proficient engineers from beginners.

P-1 — Advanced SQL: The Patterns You Will Use Every Week

Who this module is for: You completed Phase 1 and can write fundamental SQL. Phase 2 assumes you are building real applications and need the SQL patterns that separate engineers who can write queries from engineers who can write good queries. This module covers the constructs that appear in almost every production PostgreSQL codebase.

Common Table Expressions (CTEs) — `WITH` Clauses

A CTE creates a named, temporary result set within a query. It makes complex queries readable by breaking them into named steps.

sql
-- Without CTE: hard to read
SELECT u.display_name, task_counts.total
FROM users u
JOIN (
  SELECT assigned_to, COUNT(*) AS total
  FROM tasks
  WHERE status != 'done'
  GROUP BY assigned_to
) task_counts ON u.id = task_counts.assigned_to
ORDER BY task_counts.total DESC;

-- With CTE: same query, readable
WITH open_task_counts AS (
  SELECT assigned_to, COUNT(*) AS total
  FROM tasks
  WHERE status != 'done'
  GROUP BY assigned_to
)
SELECT u.display_name, otc.total
FROM users u
JOIN open_task_counts otc ON u.id = otc.assigned_to
ORDER BY otc.total DESC;

Chaining Multiple CTEs

sql
WITH
-- Step 1: get all orders in the last 30 days
recent_orders AS (
  SELECT customer_id, product_id, quantity, unit_price
  FROM orders
  WHERE ordered_at > NOW() - INTERVAL '30 days'
),
-- Step 2: calculate revenue per customer from those orders
customer_revenue AS (
  SELECT customer_id, SUM(quantity * unit_price) AS revenue
  FROM recent_orders
  GROUP BY customer_id
),
-- Step 3: rank customers by revenue
ranked_customers AS (
  SELECT
    customer_id,
    revenue,
    RANK() OVER (ORDER BY revenue DESC) AS rank
  FROM customer_revenue
)
SELECT
  c.name,
  rc.revenue,
  rc.rank
FROM ranked_customers rc
JOIN customers c ON rc.customer_id = c.id
WHERE rc.rank <= 10;  -- top 10 customers

CTEs execute once and are referenced by name — they are not re-executed for each reference. This is important for performance.

Recursive CTEs — Traversing Hierarchies

Recursive CTEs can traverse tree structures (org charts, categories with subcategories, file systems):

sql
-- Find all members of a team hierarchy, starting from a given manager
WITH RECURSIVE team_hierarchy AS (
  -- Base case: the starting employee
  SELECT id, name, manager_id, 0 AS depth
  FROM employees
  WHERE id = 3  -- start from VP Eng (id=3)

  UNION ALL

  -- Recursive case: find direct reports of everyone already in the result
  SELECT e.id, e.name, e.manager_id, th.depth + 1
  FROM employees e
  JOIN team_hierarchy th ON e.manager_id = th.id
)
SELECT
  REPEAT('  ', depth) || name AS indented_name,  -- indent by depth
  depth
FROM team_hierarchy
ORDER BY depth, name;

--   indented_name  | depth
-- -----------------+-------
--  VP Eng          |     0
--    Engineer 1    |     1
--    Engineer 2    |     1
--      Junior Dev  |     2

Window Functions — Computation Without Collapsing Rows

Aggregate functions collapse many rows into one. Window functions compute across a set of rows but keep every row in the output.

sql
-- Aggregate: collapses rows — one row per category
SELECT category, AVG(price) AS avg_price FROM products GROUP BY category;

-- Window function: keeps every row, adds the avg alongside each product
SELECT
  name,
  category,
  price,
  AVG(price) OVER (PARTITION BY category) AS category_avg,
  price - AVG(price) OVER (PARTITION BY category) AS diff_from_avg
FROM products;

--         name        | category  | price  | category_avg | diff_from_avg
-- --------------------+-----------+--------+--------------+---------------
--  Keyboard           | peripherals| 129.99 |     89.99    |     40.00
--  Mouse              | peripherals|  49.99 |     89.99    |    -40.00
--  Webcam             | peripherals|  89.99 |     89.99    |      0.00
--  Monitor Stand      | accessories|  39.99 |     43.32    |     -3.33
--  USB-C Hub          | accessories|  64.99 |     43.32    |     21.67
--  Laptop Sleeve      | accessories|  24.99 |     43.32    |    -18.33

The `OVER()` Clause

OVER() defines the "window" — the set of rows each calculation sees:

sql
-- PARTITION BY: like GROUP BY but doesn't collapse rows
OVER (PARTITION BY category)         -- each row sees its category's rows

-- ORDER BY: gives rows an order within the partition
OVER (PARTITION BY category ORDER BY price DESC)

-- No partition (all rows): each row sees the entire result set
OVER ()

Essential Window Functions

Ranking:

sql
SELECT
  name,
  price,
  category,
  ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS row_num,
  RANK()       OVER (PARTITION BY category ORDER BY price DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS dense_rank
FROM products;

Difference	When there is a tie
`ROW_NUMBER()`	Assigns unique numbers (arbitrary tie-breaking)
`RANK()`	Tied rows get same rank; next rank skips numbers (1,1,3)
`DENSE_RANK()`	Tied rows get same rank; next rank does not skip (1,1,2)

Access previous/next rows:

sql
SELECT
  ordered_at::DATE AS date,
  SUM(quantity * unit_price) AS daily_revenue,
  LAG(SUM(quantity * unit_price), 1)  OVER (ORDER BY ordered_at::DATE) AS prev_day,
  LEAD(SUM(quantity * unit_price), 1) OVER (ORDER BY ordered_at::DATE) AS next_day
FROM orders
GROUP BY ordered_at::DATE
ORDER BY date;
-- Shows each day's revenue alongside the previous and next day's revenue

Running totals:

sql
SELECT
  ordered_at::DATE AS date,
  SUM(quantity * unit_price) AS daily_revenue,
  SUM(SUM(quantity * unit_price)) OVER (ORDER BY ordered_at::DATE) AS cumulative_revenue
FROM orders
GROUP BY ordered_at::DATE
ORDER BY date;

Percentile:

sql
SELECT
  name,
  price,
  NTILE(4) OVER (ORDER BY price) AS price_quartile
  -- 1=cheapest 25%, 4=most expensive 25%
FROM products;

Subqueries

A subquery is a query nested inside another query.

Scalar subquery (returns one value)

sql
-- Find products priced above the overall average
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);

`IN` subquery (returns a list)

sql
-- Find customers who have placed at least one order
SELECT name FROM customers
WHERE id IN (SELECT DISTINCT customer_id FROM orders);

-- Find customers who have NEVER ordered
SELECT name FROM customers
WHERE id NOT IN (SELECT DISTINCT customer_id FROM orders WHERE customer_id IS NOT NULL);

`EXISTS` — Efficient Existence Check

EXISTS stops as soon as it finds the first match — more efficient than IN for large tables:

sql
-- Find customers who have at least one high-value order
SELECT c.name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.id
    AND o.quantity * o.unit_price > 200
);
-- "SELECT 1" — EXISTS only cares whether a row exists, not what it contains

Correlated subquery (references outer query)

sql
-- Find the most expensive product in each category
SELECT name, price, category
FROM products p1
WHERE price = (
  SELECT MAX(price)
  FROM products p2
  WHERE p2.category = p1.category  -- references the outer query's row
);

`INSERT ... ON CONFLICT` — Upsert

Upsert inserts a row if it does not exist, updates it if it does. Essential for idempotent writes.

sql
-- Insert or update a product (based on unique SKU)
INSERT INTO products (sku, name, price, in_stock)
VALUES ('KB-001', 'Mechanical Keyboard', 129.99, true)
ON CONFLICT (sku)
DO UPDATE SET
  price    = EXCLUDED.price,
  in_stock = EXCLUDED.in_stock,
  name     = EXCLUDED.name;
-- EXCLUDED.* refers to the values that were attempted to be inserted

sql
-- Insert or ignore (do nothing if duplicate)
INSERT INTO users (email)
VALUES ('alice@example.com')
ON CONFLICT (email) DO NOTHING;
-- No error, no update — silently skips the insert if email exists

sql
-- Upsert with conditional update (only update if the new value is higher)
INSERT INTO product_stats (product_id, views)
VALUES (1, 100)
ON CONFLICT (product_id)
DO UPDATE SET
  views = product_stats.views + EXCLUDED.views;
-- Adds new views to existing views on conflict

`RETURNING` with DML — Get Data Back Without a Second Query

RETURNING retrieves data from modified rows in the same statement:

sql
-- Insert and get the generated ID back
INSERT INTO tasks (project_id, created_by, title)
VALUES (1, 1, 'New task')
RETURNING id, created_at;

-- Update and see what changed
UPDATE products
SET price = price * 0.9
WHERE category = 'accessories'
RETURNING id, name, price AS new_price;

-- Delete and retrieve what was removed
DELETE FROM tasks
WHERE status = 'cancelled' AND created_at < NOW() - INTERVAL '30 days'
RETURNING id, title, project_id;

`UNION`, `INTERSECT`, `EXCEPT`

Combine result sets from multiple SELECT statements:

sql
-- UNION: all rows from both queries (removes duplicates)
SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;

-- UNION ALL: all rows including duplicates (faster — no deduplication)
SELECT 'New' AS type, id, title FROM tasks WHERE status = 'todo'
UNION ALL
SELECT 'Active' AS type, id, title FROM tasks WHERE status = 'in_progress';

-- INTERSECT: only rows that appear in both
SELECT email FROM customers
INTERSECT
SELECT email FROM newsletter_subscribers;
-- Customers who are also subscribers

-- EXCEPT: rows in first but not second
SELECT email FROM customers
EXCEPT
SELECT email FROM newsletter_subscribers;
-- Customers who are NOT subscribers (compare with LEFT JOIN ... WHERE IS NULL)

`CASE` Expressions — Inline Conditionals

sql
-- Categorise prices
SELECT
  name,
  price,
  CASE
    WHEN price < 30  THEN 'budget'
    WHEN price < 100 THEN 'mid-range'
    ELSE                  'premium'
  END AS price_tier
FROM products;

-- Count by status using conditional aggregation
SELECT
  COUNT(*) FILTER (WHERE status = 'todo')        AS todo_count,
  COUNT(*) FILTER (WHERE status = 'in_progress') AS active_count,
  COUNT(*) FILTER (WHERE status = 'done')        AS done_count
FROM tasks
WHERE project_id = 1;

-- Equivalent with CASE:
SELECT
  SUM(CASE WHEN status = 'todo'        THEN 1 ELSE 0 END) AS todo_count,
  SUM(CASE WHEN status = 'in_progress' THEN 1 ELSE 0 END) AS active_count,
  SUM(CASE WHEN status = 'done'        THEN 1 ELSE 0 END) AS done_count
FROM tasks
WHERE project_id = 1;

Practical Exercise: Task Dashboard Queries

Using the task manager schema from F-7:

sql
-- Q1: For each project, show total tasks, done tasks, and % completion
WITH task_summary AS (
  SELECT
    project_id,
    COUNT(*)                                           AS total,
    COUNT(*) FILTER (WHERE status = 'done')            AS done,
    COUNT(*) FILTER (WHERE status = 'in_progress')     AS in_progress
  FROM tasks
  GROUP BY project_id
)
SELECT
  p.name AS project,
  ts.total,
  ts.done,
  ts.in_progress,
  ROUND(100.0 * ts.done / NULLIF(ts.total, 0), 1) AS pct_complete
FROM projects p
JOIN task_summary ts ON p.id = ts.project_id;

-- Q2: Rank users by number of tasks they have completed
SELECT
  u.display_name,
  COUNT(*) AS completed_tasks,
  RANK() OVER (ORDER BY COUNT(*) DESC) AS rank
FROM tasks t
JOIN users u ON t.assigned_to = u.id
WHERE t.status = 'done'
GROUP BY u.id, u.display_name;

-- Q3: For each task, show how long it has been open (or was open)
SELECT
  title,
  status,
  created_at,
  COALESCE(completed_at, NOW()) AS closed_or_now,
  AGE(COALESCE(completed_at, NOW()), created_at) AS age
FROM tasks
ORDER BY created_at;

-- Q4: Upsert a task status (idempotent status update)
INSERT INTO tasks (id, project_id, created_by, title, status)
VALUES (1, 1, 1, 'Existing Task', 'in_progress')
ON CONFLICT (id) DO UPDATE SET
  status = EXCLUDED.status,
  updated_at = NOW();

-- Q5: Find projects with no tasks
SELECT p.name, p.created_at
FROM projects p
WHERE NOT EXISTS (
  SELECT 1 FROM tasks t WHERE t.project_id = p.id
);

Summary

Feature	When to Use
CTE (`WITH`)	Break complex queries into readable named steps
Recursive CTE	Traverse hierarchical data (org charts, categories)
`ROW_NUMBER()`	Unique sequential position
`RANK()` / `DENSE_RANK()`	Position with tied rows
`LAG()` / `LEAD()`	Access previous/next row values
Running `SUM()` OVER	Cumulative totals
Scalar subquery	Single-value comparison (`price > (SELECT AVG...)`)
`EXISTS`	Efficient check for row existence
`ON CONFLICT DO UPDATE`	Upsert — insert or update atomically
`ON CONFLICT DO NOTHING`	Insert if not exists, ignore if duplicate
`RETURNING`	Get modified row data back without a second query
`FILTER (WHERE ...)`	Conditional aggregation without subqueries
`CASE WHEN`	Inline conditional logic in SELECT

Module P-2 covers indexing from a practitioner's perspective — not the internals (that is Phase 3), but when to add an index, when not to, and how to read EXPLAIN output to measure the difference.

Next: P-2 — Indexes — When and How to Add Them →

PreviousModule F-7: Your First Real Application Schema Next Module P-2: Indexes — When and How to Add Them

P-1 — Advanced SQL: The Patterns You Will Use Every Week

Common Table Expressions (CTEs) — WITH Clauses

Chaining Multiple CTEs

Recursive CTEs — Traversing Hierarchies

Window Functions — Computation Without Collapsing Rows

The OVER() Clause

Essential Window Functions

Subqueries

Scalar subquery (returns one value)

IN subquery (returns a list)

EXISTS — Efficient Existence Check

Correlated subquery (references outer query)

INSERT ... ON CONFLICT — Upsert

RETURNING with DML — Get Data Back Without a Second Query

UNION, INTERSECT, EXCEPT

CASE Expressions — Inline Conditionals