Q4 of 26 · SQL

How do GROUP BY and HAVING work, and how would you use them to find duplicate records?

SQLJuniorsqlgroup-byhavingduplicatesaggregation

Short answer

Short answer: GROUP BY collapses rows into groups by one or more columns and lets you apply aggregate functions. HAVING filters those groups — like WHERE but applied after aggregation.

Detail

GROUP BY is the building block for spotting duplicates and summarising test data. HAVING is WHERE for aggregated results.

Finding duplicates — the classic pattern:

SELECT email, COUNT(*) AS cnt
FROM   users
GROUP BY email
HAVING COUNT(*) > 1;

This returns every email address that appears more than once — meaning duplicate user registrations slipped through.

Full duplicate rows — if you want the actual IDs involved, wrap it in a subquery or CTE:

SELECT *
FROM   users
WHERE  email IN (
  SELECT email
  FROM   users
  GROUP BY email
  HAVING COUNT(*) > 1
);

Other QA uses of GROUP BY + HAVING:

  • Find test runs where more than N tests failed
  • Find products with zero inventory (HAVING COUNT = 0 or SUM = 0)
  • Validate that every user has exactly one active session

// EXAMPLE

-- Emails registered more than once
SELECT email, COUNT(*) AS duplicate_count
FROM   users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;

-- Orders with more than 10 line items (potential data anomaly)
SELECT order_id, COUNT(*) AS line_count
FROM   order_items
GROUP BY order_id
HAVING COUNT(*) > 10;

// WHAT INTERVIEWERS LOOK FOR

The HAVING COUNT(*) > 1 duplicate pattern is a must-know. Explain that WHERE filters before aggregation and HAVING filters after.

// COMMON PITFALL

Trying to use WHERE on an aggregate (WHERE COUNT(*) > 1) — that's a syntax error. HAVING is the right clause.