DISTINCT and Aggregate Functions — COUNT, SUM, AVG, MAX, MIN

9 min read

So far every query has returned a list of rows. This lesson covers the queries that return a single summary number — how many users, total revenue, average order, biggest order ever. These are called aggregate functions, and they answer the questions PMs and managers ask most often: "how many?", "how much?", "what's the average?". Combined with DISTINCT, which collapses duplicates, they let you turn a table of thousands of rows into one or two numbers you can drop into a Slack message or a test report.

DISTINCT — remove duplicate values

Sometimes a column has many duplicate values and you want to see each unique one once. DISTINCT does that:

-- What roles exist in our users table?
SELECT DISTINCT role FROM users;

Result:

| role   |
|--------|
| admin  |
| tester |
| viewer |

Three roles, even though there are five users (three of them are testers). DISTINCT collapsed the duplicates.

Why testers love this: it's the fastest way to see every value a column contains, which catches data-quality bugs immediately. If a status column should only have 'pending', 'shipped', 'completed', 'cancelled' but SELECT DISTINCT status FROM orders shows you 'Done', ' completed' (with a leading space), or 'PENDING', you've found a real bug.

-- Every order status the system has ever stored
SELECT DISTINCT status FROM orders;
-- Result: cancelled, completed, pending, shipped
-- Every category that has at least one product
SELECT DISTINCT category_id FROM products;
-- Result: 1, 2, 3

The five aggregate functions

Five functions cover almost every "summary" question:

COUNT — how many rows?

The most common one. Three useful variants:

-- Count all rows in users
SELECT COUNT(*) FROM users;
-- Result: 5
 
-- Count active users only
SELECT COUNT(*) FROM users WHERE is_active = TRUE;
-- Result: 4
 
-- Count rows where email is set (excludes NULLs)
SELECT COUNT(email) FROM users;
-- Result: 5  -- everyone has an email in our data
 
-- Count unique roles
SELECT COUNT(DISTINCT role) FROM users;
-- Result: 3

Subtle but important: COUNT(*) counts every row including those with NULLs. COUNT(column) counts only rows where that column is not NULL. COUNT(DISTINCT column) counts unique non-NULL values. Pick the one that matches your question.

SUM — total of a numeric column

-- Total revenue across all orders
SELECT SUM(total) FROM orders;

Result:

| SUM(total) |
|------------|
|    601.50  |
-- Revenue from completed orders only
SELECT SUM(total) FROM orders WHERE status = 'completed';
-- Result: 193.50  (orders #1 + #2 = 145 + 48.50)

SUM ignores NULLs. If every value is NULL, it returns NULL (not 0) — a small gotcha for tests that assert on the result.

AVG — average value

-- Average order value
SELECT AVG(total) FROM orders;
-- Result: 120.30   (601.50 / 5)
 
-- Average product price
SELECT AVG(price) FROM products;
-- Result: 73.85

AVG also ignores NULLs — so an order with a NULL total isn't counted in the denominator. If your business logic requires NULLs to count as zero, you'd write AVG(COALESCE(total, 0)) instead. (COALESCE returns the first non-NULL argument — handy for null defaults.)

MAX and MIN — extremes

These work on numbers, dates, and even text (alphabetical max/min):

-- Most expensive product
SELECT MAX(price) FROM products;
-- Result: 320.00
 
-- Cheapest product
SELECT MIN(price) FROM products;
-- Result: 8.00
 
-- Most recent order date
SELECT MAX(created_at) FROM orders;
-- Result: 2026-05-05 09:11:08
 
-- The earliest signup
SELECT MIN(created_at) FROM users;
-- Result: 2026-04-12 09:14:00

A useful trick: MAX(created_at) is the same answer as ORDER BY created_at DESC LIMIT 1 in the timestamp column — but MAX returns just the timestamp, while the LIMIT version returns the whole row. Pick the right tool for the question.

Combining aggregates and aliases

You can compute several aggregates in one query, and rename the output columns with AS:

SELECT
    COUNT(*)    AS total_orders,
    SUM(total)  AS revenue,
    AVG(total)  AS avg_order_value,
    MAX(total)  AS largest_order,
    MIN(total)  AS smallest_order
FROM orders
WHERE status = 'completed';

Result:

| total_orders | revenue | avg_order_value | largest_order | smallest_order |
|--------------|---------|-----------------|---------------|----------------|
| 2            |  193.50 |          96.75  |        145.00 |          48.50 |

Five answers in one query. AS is optional but worth using — without it, the result columns would be named COUNT(*), SUM(total), etc., which are awkward to read (and impossible to reference from code that consumes the result).

QA use cases for aggregates

A handful of patterns you'll find yourself reaching for:

-- After a bulk-import test of 100 users, verify the count
SELECT COUNT(*) FROM users WHERE email LIKE '%@import-test.com';
-- Expected: 100
 
-- Sanity check: order total should equal sum of its line items
SELECT
    o.total                    AS recorded_total,
    SUM(oi.quantity * oi.price) AS computed_total
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.id = 1
GROUP BY o.id, o.total;
-- If recorded_total != computed_total, you have a real bug
 
-- Find the average response time from a test results table
SELECT AVG(response_time_ms) FROM test_runs WHERE suite = 'checkout';
-- Quick performance baseline
 
-- How many distinct testers filed bugs this month?
SELECT COUNT(DISTINCT reporter) FROM bugs
WHERE created_at >= '2026-05-01';

The JOIN and GROUP BY in the second example are coming up in Chapter 3. For now, focus on the aggregate piece.

NULL behaviour, in one line

A small reference card you'll wish you had:

FunctionNULL behaviour
COUNT(*)Counts NULLs.
COUNT(column)Skips NULLs.
SUM, AVGSkip NULLs. Return NULL if every row is NULL.
MAX, MINSkip NULLs.

If your test result is unexpectedly NULL, the most common cause is that every row had NULL in the column you aggregated.

⚠️ Common Mistakes

  • AVG over a column with NULLs. AVG skips NULL rows, so the denominator is smaller than you may think. If your test depends on a specific average, decide upfront whether NULLs should count as 0 (AVG(COALESCE(col, 0))) or be ignored (default).
  • COUNT(column) when you meant COUNT(*). They differ when the column has NULLs. For "how many rows match this filter," use COUNT(*).
  • Forgetting AS aliases on aggregate output. result["SUM(total)"] is a fragile column name in test code. Always alias: SUM(total) AS revenue, then reference result["revenue"].

🎯 Practice Task

25 minutes. All against the e-commerce database.

  1. How many rows are in products? How many are in stock? How many are out of stock? (Three queries — confirm the in-stock + out-of-stock counts add up to the total.)
  2. What is the cheapest product, and what is its price? Use MIN. Then use ORDER BY ... LIMIT 1 to get the whole row. Notice the difference.
  3. How much money has the system collected in total across all orders? Across only 'completed' orders?
  4. What is the average order total? What is the average price across all products?
  5. How many distinct users have placed at least one order? (Hint: COUNT(DISTINCT user_id) FROM orders.)
  6. Run SELECT DISTINCT status FROM orders; and write down every status value the system has stored. If a value surprises you, that's a finding worth noting.
  7. Stretch: Build a one-line "system health" query — total users, active users, total products, in-stock products, total orders, total revenue — using a single SELECT with aliases. Read the row and pretend you're handing it to a PM in Slack.

That wraps up Chapter 2. You can now write basic SELECT queries that filter, sort, limit, deduplicate, and summarise. Chapter 3 takes the next step: combining data from multiple tables with joins, plus the more advanced filtering operators (AND, OR, NOT, IN, BETWEEN) and GROUP BY.

// tip to track lessons you complete and pick up where you left off across devices.