You've spent six chapters on SQL. Now you're going to build the kind of artefact a real QA engineer hands their team: a SQL test pack that backs up the application's API and UI tests with direct database verification. The project is ShopEasy DB Pack — a folder of SQL files that seeds a test environment, verifies that application writes are landing correctly, sweeps the database for integrity issues, generates the reports a PM will read on Monday, and tears the test data down cleanly. Pull together everything from Chapters 1–6 in one deliverable. The next lesson is the guided walkthrough; this one is the brief.
The scenario
You're the QA engineer for ShopEasy — a mid-sized e-commerce platform. Engineering ships every week, and the team's API and UI tests have been finding fewer and fewer bugs. The bugs that do slip through are almost all data-layer issues: orders saving with the wrong total, inventory not decrementing, soft-deleted users still appearing in reports, duplicate rows from retried submits.
The team lead's brief is one paragraph:
"We're missing data bugs. Build me a SQL test pack: scripts that seed a known dataset, verification queries we can call from the API and UI tests, integrity sweeps that catch duplicates and orphans before customers do, and weekly reports for the PM. Keep it portable — same SQL should run on SQLite for local dev and PostgreSQL in CI. And every test must clean up after itself; nobody wants another year of garbage rows."
That's the whole scope. Five well-defined components, each one a pattern from this course.
The database
Same e-commerce schema you've used throughout the course — users, products, categories, orders, order_items. For the capstone, scale it up: aim for ~1000 rows total — 10 users, 5 categories, 30 products, 20 orders, 60 order items. That's enough volume to make queries non-trivial without making them slow on a local SQLite instance.
The schema and the original 5/3/10/5/15 seed data live in setting-up-practice-database.mdx (Chapter 1, Lesson 4). For the capstone you'll build on top of that — adding more rows, then writing the queries that operate on the larger dataset.
The deliverables
Five SQL files, each one targeting a specific phase of the test lifecycle.
- – INSERT 10 users, 5 categories, 30 products
- – INSERT 20 orders, 60 order_items
- – Wrap in a transaction; deterministic
- – Verify user registration
- – Verify order + items created together
- – Verify order total = SUM(items)
- – Verify inventory decremented
- – Verify soft delete (deactivated, not removed)
- – Find duplicate emails
- – Find orphaned order_items
- – Find orders with mismatched totals
- – Find users dormant >6 months
- – Find products with negative or zero prices
- Record counts per table –
- Orders per status per day –
- Revenue by category –
- Top 5 customers by spending –
- Avg order value, weekly trend –
- DELETE order_items first (children) –
- Then orders, then users, then products, then categories –
- Or TRUNCATE if your dialect supports it –
1. Data setup queries (01_setup.sql)
A single transaction that inserts:
- 5 categories — Electronics, Books, Home, Office, Outdoor.
- 10 users — mix of admins, testers, viewers; some active, some not. Each email follows a recognisable test-data pattern (
*+ci@qa.localor similar) so cleanup is safe. - 30 products — distributed across the 5 categories, with a realistic price range (£5–£500). At least 5 should be
in_stock = FALSEso your test data covers stock-related scenarios. - 20 orders — across the 10 users, with a spread of statuses:
pending,processing,shipped,completed,cancelled. Spread thecreated_atover the last 60 days so date-based reports have something to chew on. - 60 order items — 1–5 items per order. Crucially: at least two orders should have a deliberate mismatch between
orders.totalandSUM(item.price * item.quantity)— your integrity sweep needs something to find.
Wrap the whole thing in BEGIN; … COMMIT; so it's atomic. If any INSERT fails, none of them land — clean baseline guaranteed.
2. Verification queries (02_verification.sql) — 10 queries
Five from the section below, plus 5 of your own design that cover scenarios from your real workplace if you have one. Suggested set:
- Verify user registration — given
email = '?', return the row and confirm exactly one row exists with the expected values. - Verify order creation — given
order_id = ?, return the order and a count of its line items in one JOIN. Asserts the order and the items were saved. - Verify order total integrity — for a given order, compare
orders.totaltoSUM(quantity * price)fromorder_items. Return zero rows if consistent. - Verify inventory decrement — given a product id, return its
in_stockflag (andstock_countif your schema has one). After a "buy this product" test, the count should have moved. - Verify soft delete — given
user_id = ?, return the row showingis_active = FALSE(ordeleted_at IS NOT NULL, depending on your application's pattern).
Each of these is a query the API and UI tests will call after they perform a write. Treat them as the SQL equivalent of test assertions.
3. Data integrity checks (03_integrity.sql) — 5 queries
Each query returns "bad rows." A healthy database returns zero rows from each. Your job is to write these queries; running them is the team lead's release-gate check.
- Duplicate emails (
GROUP BY email HAVING COUNT(*) > 1). - Orphaned
order_items(LEFT JOINorders, WHEREo.id IS NULL). - Orders whose recorded total disagrees with the sum of items (Chapter 5, Lesson 3 has the shape).
- Users with no orders in the last 6 months — possible candidates for re-engagement campaigns or dormant-account cleanup.
- Products with
price <= 0— never legitimate; always a bug or a bad import.
4. Reporting queries (04_reporting.sql) — 5 queries
The PM-readable side of the test pack. Each query should produce a small, well-formatted result.
- Record counts per table — one-row dashboard with COUNT(*) for every table.
- Orders per status per day —
GROUP BY DATE(created_at), status, with a count. - Revenue by category — JOIN
products→order_items, GROUP BY category name, SUM the line items. - Top 5 customers by spending — JOIN
users→orders, GROUP BY user, ORDER BY total spent DESC, LIMIT 5. - Average order value, weekly trend — bucket orders by week, AVG(total) per week, sorted oldest to newest.
5. Cleanup queries (05_cleanup.sql)
Reverse of setup. Delete in dependency order — order_items first, then orders, then products, users, categories. Wrap in a transaction.
If you have a dedicated test database where you control the whole environment, TRUNCATE is faster (Chapter 4, Lesson 3) — but DELETE … WHERE is safer because it can be filtered to only the rows your setup inserted. For a shared CI database, prefer DELETE with a recognisable test-data pattern (e.g., WHERE email LIKE '%+ci@qa.local').
Stretch goals
If you finish the core pack ahead of schedule, three extensions teams genuinely use:
- Add three views.
active_users,pending_orders_with_customer,category_revenue— each one a saved SELECT from your reporting queries. Your verification scripts can thenSELECT * FROM active_usersinstead of repeating the JOIN. - Write a
reset_test_datastored procedure. Wrap your cleanup script. Now CI can callCALL reset_test_data();instead of executing five DELETE statements. - Window-function trend queries. Use
ROW_NUMBER()to rank each user's orders by date; useLAG(total) OVER (ORDER BY week)to compute week-over-week revenue change. Both are queries a PM will love.
Constraints worth honouring
A few principles your team lead will check:
- Portable SQL. Should run on SQLite and PostgreSQL with at most a couple of dialect-specific lines clearly commented (
-- PG only:etc.). - Idempotent. Running the setup twice on the same database should not double the data — your script should clean up first or use stable identifiers.
- Comments where they add value. The why of a query, not the what.
/* Find orders whose total drifts from items — see playbook X */is better than restating what the SQL already says.
That's the whole brief. Five files, around 25 queries total. Next lesson walks you through writing each component, with full SQL and expected results.