Module F-4·18 min read

COUNT, SUM, AVG, GROUP BY, HAVING — the mental model that confuses beginners, explained from scratch.

F-4 — Aggregation — Summarising Your Data

Who this module is for: You completed F-3 and can filter and sort individual rows. Now you need to answer questions about groups of rows — "how many products are in each category?", "what is the average price?", "which categories have more than 5 products?". This module covers aggregation — the most conceptually tricky part of SQL for beginners.

The Problem Aggregation Solves

So far, every query has returned rows from the table — one row of output per row of input (or fewer after filtering). Aggregation is different: it collapses multiple rows into a single summary value.

Before aggregation (6 rows, 6 output rows):
  Keyboard     | peripherals | 129.99
  Mouse        | peripherals |  49.99
  Monitor Stand| accessories |  39.99
  USB-C Hub    | accessories |  64.99
  Laptop Sleeve| accessories |  24.99
  Webcam       | peripherals |  89.99

After aggregation (collapse to summary per category, 2 output rows):
  peripherals | 3 products | avg $89.99
  accessories | 3 products | avg $43.32

Aggregate Functions

These functions take a column of values and return a single summary value:

sql
-- COUNT: how many rows?
SELECT COUNT(*) FROM products;               -- total rows
SELECT COUNT(description) FROM products;     -- rows where description is NOT NULL
SELECT COUNT(DISTINCT category) FROM products; -- unique values

-- SUM: total of a numeric column
SELECT SUM(price) FROM products;

-- AVG: average value
SELECT AVG(price) FROM products;

-- MIN / MAX: smallest or largest value
SELECT MIN(price) FROM products;
SELECT MAX(price) FROM products;

-- ROUND: clean up decimal output
SELECT ROUND(AVG(price), 2) FROM products;  -- round to 2 decimal places

When you use an aggregate function without GROUP BY, it collapses the entire table into a single row:

sql
SELECT
  COUNT(*)           AS total_products,
  ROUND(AVG(price), 2) AS avg_price,
  MIN(price)         AS cheapest,
  MAX(price)         AS most_expensive,
  SUM(price)         AS total_value
FROM products;

--  total_products | avg_price | cheapest | most_expensive | total_value
-- ----------------+-----------+----------+----------------+-------------
--               5 |     74.99 |    39.99 |         129.99 |      374.95

`GROUP BY` — The Mental Model

GROUP BY is where most beginners get confused. The mental model:

PostgreSQL divides all your rows into groups based on the column(s) you specify
Each group is collapsed into a single output row
You can then apply aggregate functions to each group separately

sql
-- "How many products are in each category?"
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

--    category   | product_count
-- --------------+---------------
--  accessories  |             3
--  peripherals  |             3

Visualise what happens:

Full table:
  Keyboard      | peripherals          ─┐
  Mouse         | peripherals           ├─ GROUP 1: peripherals (3 rows)
  Webcam        | peripherals          ─┘
  Monitor Stand | accessories          ─┐
  USB-C Hub     | accessories           ├─ GROUP 2: accessories (3 rows)
  Laptop Sleeve | accessories          ─┘

After GROUP BY category + COUNT(*):
  peripherals  | 3
  accessories  | 3

The Golden Rule of GROUP BY

Every column in your SELECT list must either be in the GROUP BY clause OR be wrapped in an aggregate function.

sql
-- ✅ CORRECT: category is in GROUP BY, COUNT(*) is an aggregate
SELECT category, COUNT(*) FROM products GROUP BY category;

-- ✅ CORRECT: category in GROUP BY, AVG is an aggregate
SELECT category, AVG(price) FROM products GROUP BY category;

-- ❌ ERROR: name is neither in GROUP BY nor an aggregate
SELECT category, name, COUNT(*) FROM products GROUP BY category;
-- ERROR: column "products.name" must appear in the GROUP BY clause
-- or be used in an aggregate function

This error trips up beginners constantly. The reason: if you group by category, and there are 3 rows in the "peripherals" group, which name value should PostgreSQL return? There are three of them. It cannot pick one arbitrarily — so it forces you to either group by name too (making each row unique) or use an aggregate.

Multiple columns in GROUP BY

sql
-- "How many in-stock vs out-of-stock products per category?"
SELECT category, in_stock, COUNT(*) AS count
FROM products
GROUP BY category, in_stock
ORDER BY category, in_stock;

--    category   | in_stock | count
-- --------------+----------+-------
--  accessories  | f        |     1
--  accessories  | t        |     2
--  peripherals  | t        |     3

When you GROUP BY multiple columns, each unique combination of those columns becomes one output row.

Common Aggregation Patterns

sql
-- Total value of inventory by category
SELECT
  category,
  COUNT(*) AS num_products,
  SUM(price) AS total_value,
  ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
ORDER BY total_value DESC;

-- Cheapest and most expensive per category
SELECT
  category,
  MIN(price) AS cheapest,
  MAX(price) AS most_expensive,
  MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;

-- Count in-stock vs out-of-stock across the whole catalog
SELECT
  in_stock,
  COUNT(*) AS count
FROM products
GROUP BY in_stock;

`HAVING` — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after aggregation.

sql
-- Categories with more than 2 products
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 2;

-- Categories where average price is above $50
SELECT category, ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 50;

The difference between WHERE and HAVING:

sql
-- WHERE: filter rows BEFORE grouping
-- "Count products per category, but only count products priced over $30"
SELECT category, COUNT(*) AS count
FROM products
WHERE price > 30          -- remove cheap products before counting
GROUP BY category;

-- HAVING: filter groups AFTER grouping
-- "Only show categories where the count is greater than 2"
SELECT category, COUNT(*) AS count
FROM products
GROUP BY category
HAVING COUNT(*) > 2;      -- remove groups with few products

-- Combined: WHERE filters rows, HAVING filters the resulting groups
SELECT category, COUNT(*) AS count
FROM products
WHERE price > 30          -- first: only count products over $30
GROUP BY category
HAVING COUNT(*) > 1;      -- then: only show categories with more than 1 qualifying product

The Execution Order

Understanding the order PostgreSQL executes a query helps predict how WHERE vs HAVING behaves:

1. FROM     — identify the table
2. WHERE    — filter individual rows
3. GROUP BY — divide remaining rows into groups
4. HAVING   — filter groups
5. SELECT   — compute output columns
6. ORDER BY — sort the result
7. LIMIT    — return only N rows

This is why you cannot use a column alias from SELECT in a WHERE clause — WHERE runs before SELECT:

sql
-- ❌ ERROR: 'avg_price' alias is not available in WHERE
SELECT category, AVG(price) AS avg_price
FROM products
WHERE avg_price > 50   -- ERROR: column "avg_price" does not exist
GROUP BY category;

-- ✅ CORRECT: use HAVING (which runs after SELECT-level computation)
-- Or repeat the expression:
SELECT category, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 50;

`COUNT(*)` vs `COUNT(column)` vs `COUNT(DISTINCT column)`

These three are different:

sql
-- Setup: add a product with NULL description
INSERT INTO products (name, price) VALUES ('Mystery Box', 19.99);
-- This row has description = NULL

-- COUNT(*): counts every row, including those with NULLs
SELECT COUNT(*) FROM products;               -- 7 (includes Mystery Box)

-- COUNT(column): counts only rows where the column is NOT NULL
SELECT COUNT(description) FROM products;     -- 6 (excludes Mystery Box)

-- COUNT(DISTINCT column): counts unique non-NULL values
SELECT COUNT(DISTINCT category) FROM products;  -- 2 (peripherals, accessories)
-- Note: general category from the Mystery Box is also NULL, doesn't count

`COALESCE` and `NULLIF` in Aggregations

NULL values are silently ignored by aggregate functions — but this can surprise you:

sql
-- AVG ignores NULL values (doesn't count them in the denominator)
SELECT AVG(description) FROM products;   -- ignores rows where description is NULL

-- Replace NULL with a default value using COALESCE
SELECT
  name,
  COALESCE(description, 'No description provided') AS description
FROM products;

-- NULLIF: return NULL if two values are equal (useful to avoid division by zero)
SELECT
  category,
  SUM(price) / NULLIF(COUNT(*), 0) AS manual_avg
FROM products
GROUP BY category;
-- Without NULLIF: if somehow COUNT(*) were 0, you'd get a division by zero error
-- With NULLIF: returns NULL instead

Practical Exercise: Sales Analysis

Create a sales table and practice aggregation:

sql
CREATE TABLE sales (
  id          BIGSERIAL    PRIMARY KEY,
  product     TEXT         NOT NULL,
  region      TEXT         NOT NULL,
  quantity    INTEGER      NOT NULL,
  unit_price  NUMERIC(10,2) NOT NULL,
  sale_date   DATE         NOT NULL
);

INSERT INTO sales (product, region, quantity, unit_price, sale_date) VALUES
  ('Keyboard', 'North',   2, 129.99, '2026-01-15'),
  ('Mouse',    'North',   5,  49.99, '2026-01-15'),
  ('Keyboard', 'South',   1, 129.99, '2026-01-16'),
  ('Webcam',   'North',   3,  89.99, '2026-01-17'),
  ('Mouse',    'South',   4,  49.99, '2026-02-01'),
  ('Keyboard', 'East',    2, 129.99, '2026-02-03'),
  ('Webcam',   'South',   1,  89.99, '2026-02-10'),
  ('Mouse',    'East',    6,  49.99, '2026-02-15'),
  ('Keyboard', 'North',   3, 129.99, '2026-03-01'),
  ('Webcam',   'East',    2,  89.99, '2026-03-05');

Now answer these questions:

sql
-- Q1: Total revenue across all sales
SELECT SUM(quantity * unit_price) AS total_revenue FROM sales;

-- Q2: Total revenue by product, highest first
SELECT
  product,
  SUM(quantity * unit_price) AS revenue,
  SUM(quantity) AS units_sold
FROM sales
GROUP BY product
ORDER BY revenue DESC;

-- Q3: Total revenue by region
SELECT
  region,
  SUM(quantity * unit_price) AS revenue
FROM sales
GROUP BY region
ORDER BY revenue DESC;

-- Q4: Average order value per region
SELECT
  region,
  ROUND(AVG(quantity * unit_price), 2) AS avg_order_value,
  COUNT(*) AS num_orders
FROM sales
GROUP BY region;

-- Q5: Products that sold more than 5 units in total
SELECT product, SUM(quantity) AS total_units
FROM sales
GROUP BY product
HAVING SUM(quantity) > 5
ORDER BY total_units DESC;

-- Q6: Monthly revenue (group by year-month)
SELECT
  DATE_TRUNC('month', sale_date) AS month,
  SUM(quantity * unit_price) AS revenue
FROM sales
GROUP BY DATE_TRUNC('month', sale_date)
ORDER BY month;

-- Q7: Best-selling product per region
SELECT
  region,
  product,
  SUM(quantity) AS units
FROM sales
GROUP BY region, product
ORDER BY region, units DESC;

Summary

Concept	What it does
`COUNT(*)`	Count all rows
`COUNT(col)`	Count non-NULL values in column
`COUNT(DISTINCT col)`	Count unique non-NULL values
`SUM(col)`	Sum all values
`AVG(col)`	Average of all non-NULL values
`MIN(col)`	Smallest value
`MAX(col)`	Largest value
`GROUP BY col`	Collapse rows with same col value into one group
`HAVING condition`	Filter groups after aggregation
`WHERE` vs `HAVING`	WHERE filters rows before grouping; HAVING filters groups after
`COALESCE(val, default)`	Replace NULL with a default
`NULLIF(a, b)`	Return NULL if a equals b

The rule that saves beginners from errors: every non-aggregated column in SELECT must be in GROUP BY.

Module F-5 covers joins — the feature that makes relational databases powerful by connecting data spread across multiple tables.

Next: F-5 — Connecting Tables — Joins Demystified →

PreviousModule F-3: Filtering, Sorting, and Finding What You Need Next Module F-5: Connecting Tables — Joins Demystified