ReferenceAdvanced5-7 min reference

ETL Testing

ETL (Extract, Transform, Load) pipelines move and reshape data between systems — and silently corrupt reports when a transform is wrong. Unlike a one-off migration, pipelines run repeatedly, so tests must be repeatable too. This sheet covers what to check at each stage; it's close kin to Data Migration Testing (linked below).

Test at each stage

Stage	Verify
Extract	All expected source rows pulled; correct filters/incremental window; source unchanged
Transform	Business rules applied correctly; joins, aggregations, lookups, dedup
Load	Target row counts; upserts vs inserts; no duplicates; constraints hold

Data-quality dimensions

Dimension	Asks
Completeness	All rows/fields present?
Accuracy	Values match the rule/source of truth?
Consistency	Same value across tables/feeds?
Uniqueness	No unintended duplicates?
Validity	Types, ranges, formats correct?
Timeliness	Data fresh / within SLA?

Pipeline-specific checks

Idempotency / re-runs: running twice doesn't double-load.
Incremental loads: only new/changed rows; watermark correct.
Late / out-of-order data handled.
Failure & restart: a mid-run failure recovers without partial corruption.
Schema drift: a new/renamed source column is detected, not silently dropped.

Common mistakes

Testing only the final table, never the transform logic that produced it.
Comparing averages instead of row-level reconciliation (masks offsetting errors).
Ignoring incremental-load and re-run idempotency (double counting).
No alerting on data-quality failures — bad data reaches dashboards unnoticed.
Tiny test data that never exercises late/duplicate/null edge cases.

// Related resources

QA Process

Test data management guide

Common Bugs

Data bug patterns