Back/Module F-4 Aggregation — Summarising Your Data
Module F-4·18 min read

COUNT, SUM, AVG, GROUP BY, HAVING — the mental model that confuses beginners, explained from scratch.

F-4 — Aggregation — Summarising Your Data

Who this module is for: You completed F-3 and can filter and sort individual rows. Now you need to answer questions about groups of rows — "how many products are in each category?", "what is the average price?", "which categories have more than 5 products?". This module covers aggregation — the most conceptually tricky part of SQL for beginners.


The Problem Aggregation Solves

So far, every query has returned rows from the table — one row of output per row of input (or fewer after filtering). Aggregation is different: it collapses multiple rows into a single summary value.

Before aggregation (6 rows, 6 output rows):
  Keyboard     | peripherals | 129.99
  Mouse        | peripherals |  49.99
  Monitor Stand| accessories |  39.99
  USB-C Hub    | accessories |  64.99
  Laptop Sleeve| accessories |  24.99
  Webcam       | peripherals |  89.99

After aggregation (collapse to summary per category, 2 output rows):
  peripherals | 3 products | avg $89.99
  accessories | 3 products | avg $43.32

Aggregate Functions

These functions take a column of values and return a single summary value:

sql
-- COUNT: how many rows? SELECT COUNT(*) FROM products; -- total rows SELECT COUNT(description) FROM products; -- rows where description is NOT NULL SELECT COUNT(DISTINCT category) FROM products; -- unique values -- SUM: total of a numeric column SELECT SUM(price) FROM products; -- AVG: average value SELECT AVG(price) FROM products; -- MIN / MAX: smallest or largest value SELECT MIN(price) FROM products; SELECT MAX(price) FROM products; -- ROUND: clean up decimal output SELECT ROUND(AVG(price), 2) FROM products; -- round to 2 decimal places

When you use an aggregate function without GROUP BY, it collapses the entire table into a single row:

sql
SELECT COUNT(*) AS total_products, ROUND(AVG(price), 2) AS avg_price, MIN(price) AS cheapest, MAX(price) AS most_expensive, SUM(price) AS total_value FROM products; -- total_products | avg_price | cheapest | most_expensive | total_value -- ----------------+-----------+----------+----------------+------------- -- 5 | 74.99 | 39.99 | 129.99 | 374.95

GROUP BY — The Mental Model

GROUP BY is where most beginners get confused. The mental model:

  1. PostgreSQL divides all your rows into groups based on the column(s) you specify
  2. Each group is collapsed into a single output row
  3. You can then apply aggregate functions to each group separately
sql
-- "How many products are in each category?" SELECT category, COUNT(*) AS product_count FROM products GROUP BY category; -- category | product_count -- --------------+--------------- -- accessories | 3 -- peripherals | 3

Visualise what happens:

Full table:
  Keyboard      | peripherals          ─┐
  Mouse         | peripherals           ├─ GROUP 1: peripherals (3 rows)
  Webcam        | peripherals          ─┘
  Monitor Stand | accessories          ─┐
  USB-C Hub     | accessories           ├─ GROUP 2: accessories (3 rows)
  Laptop Sleeve | accessories          ─┘

After GROUP BY category + COUNT(*):
  peripherals  | 3
  accessories  | 3

The Golden Rule of GROUP BY

Every column in your SELECT list must either be in the GROUP BY clause OR be wrapped in an aggregate function.

sql
-- ✅ CORRECT: category is in GROUP BY, COUNT(*) is an aggregate SELECT category, COUNT(*) FROM products GROUP BY category; -- ✅ CORRECT: category in GROUP BY, AVG is an aggregate SELECT category, AVG(price) FROM products GROUP BY category; -- ❌ ERROR: name is neither in GROUP BY nor an aggregate SELECT category, name, COUNT(*) FROM products GROUP BY category; -- ERROR: column "products.name" must appear in the GROUP BY clause -- or be used in an aggregate function

This error trips up beginners constantly. The reason: if you group by category, and there are 3 rows in the "peripherals" group, which name value should PostgreSQL return? There are three of them. It cannot pick one arbitrarily — so it forces you to either group by name too (making each row unique) or use an aggregate.

Multiple columns in GROUP BY

sql
-- "How many in-stock vs out-of-stock products per category?" SELECT category, in_stock, COUNT(*) AS count FROM products GROUP BY category, in_stock ORDER BY category, in_stock; -- category | in_stock | count -- --------------+----------+------- -- accessories | f | 1 -- accessories | t | 2 -- peripherals | t | 3

When you GROUP BY multiple columns, each unique combination of those columns becomes one output row.


Common Aggregation Patterns

sql
-- Total value of inventory by category SELECT category, COUNT(*) AS num_products, SUM(price) AS total_value, ROUND(AVG(price), 2) AS avg_price FROM products GROUP BY category ORDER BY total_value DESC; -- Cheapest and most expensive per category SELECT category, MIN(price) AS cheapest, MAX(price) AS most_expensive, MAX(price) - MIN(price) AS price_range FROM products GROUP BY category; -- Count in-stock vs out-of-stock across the whole catalog SELECT in_stock, COUNT(*) AS count FROM products GROUP BY in_stock;

HAVING — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after aggregation.

sql
-- Categories with more than 2 products SELECT category, COUNT(*) AS product_count FROM products GROUP BY category HAVING COUNT(*) > 2; -- Categories where average price is above $50 SELECT category, ROUND(AVG(price), 2) AS avg_price FROM products GROUP BY category HAVING AVG(price) > 50;

The difference between WHERE and HAVING:

sql
-- WHERE: filter rows BEFORE grouping -- "Count products per category, but only count products priced over $30" SELECT category, COUNT(*) AS count FROM products WHERE price > 30 -- remove cheap products before counting GROUP BY category; -- HAVING: filter groups AFTER grouping -- "Only show categories where the count is greater than 2" SELECT category, COUNT(*) AS count FROM products GROUP BY category HAVING COUNT(*) > 2; -- remove groups with few products -- Combined: WHERE filters rows, HAVING filters the resulting groups SELECT category, COUNT(*) AS count FROM products WHERE price > 30 -- first: only count products over $30 GROUP BY category HAVING COUNT(*) > 1; -- then: only show categories with more than 1 qualifying product

The Execution Order

Understanding the order PostgreSQL executes a query helps predict how WHERE vs HAVING behaves:

1. FROM     — identify the table
2. WHERE    — filter individual rows
3. GROUP BY — divide remaining rows into groups
4. HAVING   — filter groups
5. SELECT   — compute output columns
6. ORDER BY — sort the result
7. LIMIT    — return only N rows

This is why you cannot use a column alias from SELECT in a WHERE clause — WHERE runs before SELECT:

sql
-- ❌ ERROR: 'avg_price' alias is not available in WHERE SELECT category, AVG(price) AS avg_price FROM products WHERE avg_price > 50 -- ERROR: column "avg_price" does not exist GROUP BY category; -- ✅ CORRECT: use HAVING (which runs after SELECT-level computation) -- Or repeat the expression: SELECT category, AVG(price) AS avg_price FROM products GROUP BY category HAVING AVG(price) > 50;

COUNT(*) vs COUNT(column) vs COUNT(DISTINCT column)

These three are different:

sql
-- Setup: add a product with NULL description INSERT INTO products (name, price) VALUES ('Mystery Box', 19.99); -- This row has description = NULL -- COUNT(*): counts every row, including those with NULLs SELECT COUNT(*) FROM products; -- 7 (includes Mystery Box) -- COUNT(column): counts only rows where the column is NOT NULL SELECT COUNT(description) FROM products; -- 6 (excludes Mystery Box) -- COUNT(DISTINCT column): counts unique non-NULL values SELECT COUNT(DISTINCT category) FROM products; -- 2 (peripherals, accessories) -- Note: general category from the Mystery Box is also NULL, doesn't count

COALESCE and NULLIF in Aggregations

NULL values are silently ignored by aggregate functions — but this can surprise you:

sql
-- AVG ignores NULL values (doesn't count them in the denominator) SELECT AVG(description) FROM products; -- ignores rows where description is NULL -- Replace NULL with a default value using COALESCE SELECT name, COALESCE(description, 'No description provided') AS description FROM products; -- NULLIF: return NULL if two values are equal (useful to avoid division by zero) SELECT category, SUM(price) / NULLIF(COUNT(*), 0) AS manual_avg FROM products GROUP BY category; -- Without NULLIF: if somehow COUNT(*) were 0, you'd get a division by zero error -- With NULLIF: returns NULL instead

Practical Exercise: Sales Analysis

Create a sales table and practice aggregation:

sql
CREATE TABLE sales ( id BIGSERIAL PRIMARY KEY, product TEXT NOT NULL, region TEXT NOT NULL, quantity INTEGER NOT NULL, unit_price NUMERIC(10,2) NOT NULL, sale_date DATE NOT NULL ); INSERT INTO sales (product, region, quantity, unit_price, sale_date) VALUES ('Keyboard', 'North', 2, 129.99, '2026-01-15'), ('Mouse', 'North', 5, 49.99, '2026-01-15'), ('Keyboard', 'South', 1, 129.99, '2026-01-16'), ('Webcam', 'North', 3, 89.99, '2026-01-17'), ('Mouse', 'South', 4, 49.99, '2026-02-01'), ('Keyboard', 'East', 2, 129.99, '2026-02-03'), ('Webcam', 'South', 1, 89.99, '2026-02-10'), ('Mouse', 'East', 6, 49.99, '2026-02-15'), ('Keyboard', 'North', 3, 129.99, '2026-03-01'), ('Webcam', 'East', 2, 89.99, '2026-03-05');

Now answer these questions:

sql
-- Q1: Total revenue across all sales SELECT SUM(quantity * unit_price) AS total_revenue FROM sales; -- Q2: Total revenue by product, highest first SELECT product, SUM(quantity * unit_price) AS revenue, SUM(quantity) AS units_sold FROM sales GROUP BY product ORDER BY revenue DESC; -- Q3: Total revenue by region SELECT region, SUM(quantity * unit_price) AS revenue FROM sales GROUP BY region ORDER BY revenue DESC; -- Q4: Average order value per region SELECT region, ROUND(AVG(quantity * unit_price), 2) AS avg_order_value, COUNT(*) AS num_orders FROM sales GROUP BY region; -- Q5: Products that sold more than 5 units in total SELECT product, SUM(quantity) AS total_units FROM sales GROUP BY product HAVING SUM(quantity) > 5 ORDER BY total_units DESC; -- Q6: Monthly revenue (group by year-month) SELECT DATE_TRUNC('month', sale_date) AS month, SUM(quantity * unit_price) AS revenue FROM sales GROUP BY DATE_TRUNC('month', sale_date) ORDER BY month; -- Q7: Best-selling product per region SELECT region, product, SUM(quantity) AS units FROM sales GROUP BY region, product ORDER BY region, units DESC;

Summary

ConceptWhat it does
COUNT(*)Count all rows
COUNT(col)Count non-NULL values in column
COUNT(DISTINCT col)Count unique non-NULL values
SUM(col)Sum all values
AVG(col)Average of all non-NULL values
MIN(col)Smallest value
MAX(col)Largest value
GROUP BY colCollapse rows with same col value into one group
HAVING conditionFilter groups after aggregation
WHERE vs HAVINGWHERE filters rows before grouping; HAVING filters groups after
COALESCE(val, default)Replace NULL with a default
NULLIF(a, b)Return NULL if a equals b

The rule that saves beginners from errors: every non-aggregated column in SELECT must be in GROUP BY.


Module F-5 covers joins — the feature that makes relational databases powerful by connecting data spread across multiple tables.

Next: F-5 — Connecting Tables — Joins Demystified →

Discussion