COUNT, SUM, AVG, GROUP BY, HAVING — the mental model that confuses beginners, explained from scratch.
F-4 — Aggregation — Summarising Your Data
Who this module is for: You completed F-3 and can filter and sort individual rows. Now you need to answer questions about groups of rows — "how many products are in each category?", "what is the average price?", "which categories have more than 5 products?". This module covers aggregation — the most conceptually tricky part of SQL for beginners.
The Problem Aggregation Solves
So far, every query has returned rows from the table — one row of output per row of input (or fewer after filtering). Aggregation is different: it collapses multiple rows into a single summary value.
Before aggregation (6 rows, 6 output rows):
Keyboard | peripherals | 129.99
Mouse | peripherals | 49.99
Monitor Stand| accessories | 39.99
USB-C Hub | accessories | 64.99
Laptop Sleeve| accessories | 24.99
Webcam | peripherals | 89.99
After aggregation (collapse to summary per category, 2 output rows):
peripherals | 3 products | avg $89.99
accessories | 3 products | avg $43.32
Aggregate Functions
These functions take a column of values and return a single summary value:
sql-- COUNT: how many rows? SELECT COUNT(*) FROM products; -- total rows SELECT COUNT(description) FROM products; -- rows where description is NOT NULL SELECT COUNT(DISTINCT category) FROM products; -- unique values -- SUM: total of a numeric column SELECT SUM(price) FROM products; -- AVG: average value SELECT AVG(price) FROM products; -- MIN / MAX: smallest or largest value SELECT MIN(price) FROM products; SELECT MAX(price) FROM products; -- ROUND: clean up decimal output SELECT ROUND(AVG(price), 2) FROM products; -- round to 2 decimal places
When you use an aggregate function without GROUP BY, it collapses the entire table into a single row:
sqlSELECT COUNT(*) AS total_products, ROUND(AVG(price), 2) AS avg_price, MIN(price) AS cheapest, MAX(price) AS most_expensive, SUM(price) AS total_value FROM products; -- total_products | avg_price | cheapest | most_expensive | total_value -- ----------------+-----------+----------+----------------+------------- -- 5 | 74.99 | 39.99 | 129.99 | 374.95
GROUP BY — The Mental Model
GROUP BY is where most beginners get confused. The mental model:
- PostgreSQL divides all your rows into groups based on the column(s) you specify
- Each group is collapsed into a single output row
- You can then apply aggregate functions to each group separately
sql-- "How many products are in each category?" SELECT category, COUNT(*) AS product_count FROM products GROUP BY category; -- category | product_count -- --------------+--------------- -- accessories | 3 -- peripherals | 3
Visualise what happens:
Full table:
Keyboard | peripherals ─┐
Mouse | peripherals ├─ GROUP 1: peripherals (3 rows)
Webcam | peripherals ─┘
Monitor Stand | accessories ─┐
USB-C Hub | accessories ├─ GROUP 2: accessories (3 rows)
Laptop Sleeve | accessories ─┘
After GROUP BY category + COUNT(*):
peripherals | 3
accessories | 3
The Golden Rule of GROUP BY
Every column in your SELECT list must either be in the GROUP BY clause OR be wrapped in an aggregate function.
sql-- ✅ CORRECT: category is in GROUP BY, COUNT(*) is an aggregate SELECT category, COUNT(*) FROM products GROUP BY category; -- ✅ CORRECT: category in GROUP BY, AVG is an aggregate SELECT category, AVG(price) FROM products GROUP BY category; -- ❌ ERROR: name is neither in GROUP BY nor an aggregate SELECT category, name, COUNT(*) FROM products GROUP BY category; -- ERROR: column "products.name" must appear in the GROUP BY clause -- or be used in an aggregate function
This error trips up beginners constantly. The reason: if you group by category, and there are 3 rows in the "peripherals" group, which name value should PostgreSQL return? There are three of them. It cannot pick one arbitrarily — so it forces you to either group by name too (making each row unique) or use an aggregate.
Multiple columns in GROUP BY
sql-- "How many in-stock vs out-of-stock products per category?" SELECT category, in_stock, COUNT(*) AS count FROM products GROUP BY category, in_stock ORDER BY category, in_stock; -- category | in_stock | count -- --------------+----------+------- -- accessories | f | 1 -- accessories | t | 2 -- peripherals | t | 3
When you GROUP BY multiple columns, each unique combination of those columns becomes one output row.
Common Aggregation Patterns
sql-- Total value of inventory by category SELECT category, COUNT(*) AS num_products, SUM(price) AS total_value, ROUND(AVG(price), 2) AS avg_price FROM products GROUP BY category ORDER BY total_value DESC; -- Cheapest and most expensive per category SELECT category, MIN(price) AS cheapest, MAX(price) AS most_expensive, MAX(price) - MIN(price) AS price_range FROM products GROUP BY category; -- Count in-stock vs out-of-stock across the whole catalog SELECT in_stock, COUNT(*) AS count FROM products GROUP BY in_stock;
HAVING — Filtering Groups
WHERE filters individual rows before grouping. HAVING filters groups after aggregation.
sql-- Categories with more than 2 products SELECT category, COUNT(*) AS product_count FROM products GROUP BY category HAVING COUNT(*) > 2; -- Categories where average price is above $50 SELECT category, ROUND(AVG(price), 2) AS avg_price FROM products GROUP BY category HAVING AVG(price) > 50;
The difference between WHERE and HAVING:
sql-- WHERE: filter rows BEFORE grouping -- "Count products per category, but only count products priced over $30" SELECT category, COUNT(*) AS count FROM products WHERE price > 30 -- remove cheap products before counting GROUP BY category; -- HAVING: filter groups AFTER grouping -- "Only show categories where the count is greater than 2" SELECT category, COUNT(*) AS count FROM products GROUP BY category HAVING COUNT(*) > 2; -- remove groups with few products -- Combined: WHERE filters rows, HAVING filters the resulting groups SELECT category, COUNT(*) AS count FROM products WHERE price > 30 -- first: only count products over $30 GROUP BY category HAVING COUNT(*) > 1; -- then: only show categories with more than 1 qualifying product
The Execution Order
Understanding the order PostgreSQL executes a query helps predict how WHERE vs HAVING behaves:
1. FROM — identify the table
2. WHERE — filter individual rows
3. GROUP BY — divide remaining rows into groups
4. HAVING — filter groups
5. SELECT — compute output columns
6. ORDER BY — sort the result
7. LIMIT — return only N rows
This is why you cannot use a column alias from SELECT in a WHERE clause — WHERE runs before SELECT:
sql-- ❌ ERROR: 'avg_price' alias is not available in WHERE SELECT category, AVG(price) AS avg_price FROM products WHERE avg_price > 50 -- ERROR: column "avg_price" does not exist GROUP BY category; -- ✅ CORRECT: use HAVING (which runs after SELECT-level computation) -- Or repeat the expression: SELECT category, AVG(price) AS avg_price FROM products GROUP BY category HAVING AVG(price) > 50;
COUNT(*) vs COUNT(column) vs COUNT(DISTINCT column)
These three are different:
sql-- Setup: add a product with NULL description INSERT INTO products (name, price) VALUES ('Mystery Box', 19.99); -- This row has description = NULL -- COUNT(*): counts every row, including those with NULLs SELECT COUNT(*) FROM products; -- 7 (includes Mystery Box) -- COUNT(column): counts only rows where the column is NOT NULL SELECT COUNT(description) FROM products; -- 6 (excludes Mystery Box) -- COUNT(DISTINCT column): counts unique non-NULL values SELECT COUNT(DISTINCT category) FROM products; -- 2 (peripherals, accessories) -- Note: general category from the Mystery Box is also NULL, doesn't count
COALESCE and NULLIF in Aggregations
NULL values are silently ignored by aggregate functions — but this can surprise you:
sql-- AVG ignores NULL values (doesn't count them in the denominator) SELECT AVG(description) FROM products; -- ignores rows where description is NULL -- Replace NULL with a default value using COALESCE SELECT name, COALESCE(description, 'No description provided') AS description FROM products; -- NULLIF: return NULL if two values are equal (useful to avoid division by zero) SELECT category, SUM(price) / NULLIF(COUNT(*), 0) AS manual_avg FROM products GROUP BY category; -- Without NULLIF: if somehow COUNT(*) were 0, you'd get a division by zero error -- With NULLIF: returns NULL instead
Practical Exercise: Sales Analysis
Create a sales table and practice aggregation:
sqlCREATE TABLE sales ( id BIGSERIAL PRIMARY KEY, product TEXT NOT NULL, region TEXT NOT NULL, quantity INTEGER NOT NULL, unit_price NUMERIC(10,2) NOT NULL, sale_date DATE NOT NULL ); INSERT INTO sales (product, region, quantity, unit_price, sale_date) VALUES ('Keyboard', 'North', 2, 129.99, '2026-01-15'), ('Mouse', 'North', 5, 49.99, '2026-01-15'), ('Keyboard', 'South', 1, 129.99, '2026-01-16'), ('Webcam', 'North', 3, 89.99, '2026-01-17'), ('Mouse', 'South', 4, 49.99, '2026-02-01'), ('Keyboard', 'East', 2, 129.99, '2026-02-03'), ('Webcam', 'South', 1, 89.99, '2026-02-10'), ('Mouse', 'East', 6, 49.99, '2026-02-15'), ('Keyboard', 'North', 3, 129.99, '2026-03-01'), ('Webcam', 'East', 2, 89.99, '2026-03-05');
Now answer these questions:
sql-- Q1: Total revenue across all sales SELECT SUM(quantity * unit_price) AS total_revenue FROM sales; -- Q2: Total revenue by product, highest first SELECT product, SUM(quantity * unit_price) AS revenue, SUM(quantity) AS units_sold FROM sales GROUP BY product ORDER BY revenue DESC; -- Q3: Total revenue by region SELECT region, SUM(quantity * unit_price) AS revenue FROM sales GROUP BY region ORDER BY revenue DESC; -- Q4: Average order value per region SELECT region, ROUND(AVG(quantity * unit_price), 2) AS avg_order_value, COUNT(*) AS num_orders FROM sales GROUP BY region; -- Q5: Products that sold more than 5 units in total SELECT product, SUM(quantity) AS total_units FROM sales GROUP BY product HAVING SUM(quantity) > 5 ORDER BY total_units DESC; -- Q6: Monthly revenue (group by year-month) SELECT DATE_TRUNC('month', sale_date) AS month, SUM(quantity * unit_price) AS revenue FROM sales GROUP BY DATE_TRUNC('month', sale_date) ORDER BY month; -- Q7: Best-selling product per region SELECT region, product, SUM(quantity) AS units FROM sales GROUP BY region, product ORDER BY region, units DESC;
Summary
| Concept | What it does |
|---|---|
COUNT(*) | Count all rows |
COUNT(col) | Count non-NULL values in column |
COUNT(DISTINCT col) | Count unique non-NULL values |
SUM(col) | Sum all values |
AVG(col) | Average of all non-NULL values |
MIN(col) | Smallest value |
MAX(col) | Largest value |
GROUP BY col | Collapse rows with same col value into one group |
HAVING condition | Filter groups after aggregation |
WHERE vs HAVING | WHERE filters rows before grouping; HAVING filters groups after |
COALESCE(val, default) | Replace NULL with a default |
NULLIF(a, b) | Return NULL if a equals b |
The rule that saves beginners from errors: every non-aggregated column in SELECT must be in GROUP BY.
Module F-5 covers joins — the feature that makes relational databases powerful by connecting data spread across multiple tables.
Next: F-5 — Connecting Tables — Joins Demystified →