Q11 of 26 · SQL

What is a subquery and when would you use one instead of a JOIN in QA work?

SQLMidsqlsubqueriesnot-innot-existscorrelated

Short answer

Short answer: A subquery is a SELECT nested inside another query. Use it when the inner result is a single value or a list that filters the outer query, and when a JOIN would produce duplicate rows that are hard to deduplicate.

Detail

Subqueries come in three flavours:

Scalar subquery — returns a single value, used in SELECT or WHERE:

-- Orders whose total exceeds the average order amount
SELECT order_id, total_amount
FROM   orders
WHERE  total_amount > (SELECT AVG(total_amount) FROM orders);

List subquery — returns a column of values, used with IN/NOT IN:

-- Users who have NEVER placed an order
SELECT id, email
FROM   users
WHERE  id NOT IN (SELECT DISTINCT customer_id FROM orders);

Correlated subquery — references the outer query, runs once per outer row:

-- Most recent order date per user
SELECT u.id, u.email,
  (SELECT MAX(created_at) FROM orders o WHERE o.customer_id = u.id) AS last_order
FROM users u;

JOIN vs subquery: JOINs are generally faster and the query planner optimises them well. Subqueries are clearer when the inner logic is a filter condition rather than a data source. NOT IN subqueries have a NULL gotcha — if any value in the subquery result is NULL, the whole NOT IN returns no rows; NOT EXISTS is safer.

// EXAMPLE

-- Users with no orders (safer than NOT IN when customer_id can be NULL)
SELECT id, email
FROM   users u
WHERE  NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = u.id
);

// WHAT INTERVIEWERS LOOK FOR

The three types, at least one QA example, and the NOT IN NULL gotcha. Knowing when NOT EXISTS is safer than NOT IN is a strong signal.

// COMMON PITFALL

NOT IN with a subquery that returns any NULL rows silently returns an empty result set — use NOT EXISTS instead.