How do GROUP BY and HAVING work, and how would you use them to find duplicate records?

Question

Accepted Answer

GROUP BY collapses rows into groups by one or more columns and lets you apply aggregate functions. HAVING filters those groups — like WHERE but applied after aggregation. GROUP BY is the building block for spotting duplicates and summarising test data. HAVING is WHERE for aggregated results. Finding duplicates — the classic pattern: This returns every email address that appears more than once — meaning duplicate user registrations slipped through. Full duplicate rows — if you want the actual IDs involved, wrap it in a subquery or CTE: Other QA uses of GROUP BY + HAVING: Find test runs where more than N tests failed Find products with zero inventory (HAVING COUNT = 0 or SUM = 0) Validate that every user has exactly one active session

How do GROUP BY and HAVING work, and how would you use them to find duplicate records?

// EXAMPLE

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL

How do GROUP BY and HAVING work, and how would you use them to find duplicate records?

Short answer

Detail

// EXAMPLE

// WHAT INTERVIEWERS LOOK FOR

// COMMON PITFALL