CSV is the unglamorous workhorse of QA data. Test cases exported from a spreadsheet, fixtures generated by a backend team, run reports consumed by management dashboards — they're all CSV. Python's built-in csv module handles every awkward corner of the format (commas inside quoted fields, escaped quotes, trailing whitespace) so you don't have to roll your own parser. This lesson covers the four main shapes you'll meet: reading rows as lists, reading rows as dicts (the form you'll actually use), writing rows from lists, and writing rows from dicts.
What CSV looks like
A typical test data file:
name,email,role
Alice,alice@test.com,admin
Bob,bob@test.com,tester
Carol,carol@test.com,viewer
The first line is the header row — column names. Each subsequent line is a record. Cells are separated by commas; if a cell itself contains a comma, the cell is wrapped in double quotes. Naively splitting on commas breaks the moment a field contains one — which is why import csv exists.
Reading rows as lists — csv.reader
import csv
with open("test_data.csv", "r", newline="", encoding="utf-8") as f:
reader = csv.reader(f)
header = next(reader) # consume the first row as headers
print(header) # ['name', 'email', 'role']
for row in reader:
print(row) # each row is a list of stringsA few details:
newline=""— pass this when you open a CSV. Thecsvmodule handles its own line endings; if you let Python's default newline translation interfere, you can get blank rows on Windows.csv.reader(f)— wraps the file object in a reader that yields one list per row. Every cell is a string, even if it looks like a number.next(reader)— pulls the first row. We use it to peel off the header so the loop only sees data rows.
For small files this works, but you're indexing rows by position (row[0], row[1], …), which is fragile — add a column to the CSV and every index shifts.
Reading rows as dicts — csv.DictReader (the one you'll use)
DictReader reads the header automatically and yields each row as a dict keyed by column name. This is what you should reach for 95% of the time:
import csv
with open("test_data.csv", "r", newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(f"User: {row['name']:<10} Email: {row['email']}")Output:
User: Alice Email: alice@test.com
User: Bob Email: bob@test.com
User: Carol Email: carol@test.com
Every row is a dict like {"name": "Alice", "email": "alice@test.com", "role": "admin"}. Add a new column to the CSV and existing code keeps working — that resilience is why DictReader wins.
Reminder: every value DictReader hands you is a string. If a column holds numbers, convert with
int(row["age"])orfloat(row["price"])before doing maths.
Loading the whole CSV into memory
For data-driven tests you usually want every row available before you start running:
import csv
with open("test_data.csv", "r", newline="", encoding="utf-8") as f:
rows = list(csv.DictReader(f))
print(f"Loaded {len(rows)} rows")
print(rows[0]) # the first row as a dictWrapping a DictReader in list(...) materialises every row into a list of dicts. Fine for files up to tens of thousands of rows; if you have millions, iterate one row at a time as in the previous example.
Writing rows from lists — csv.writer
import csv
with open("results.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["test_name", "status", "duration_ms"])
writer.writerow(["login_test", "PASS", 1250])
writer.writerow(["search_test", "FAIL", 3400])
writer.writerow(["logout_test", "PASS", 410])Open the resulting file:
test_name,status,duration_ms
login_test,PASS,1250
search_test,FAIL,3400
logout_test,PASS,410
csv.writer accepts any iterable as a row — Python converts each value to a string for you. Numbers go through unchanged; strings with commas are quoted automatically.
For a list of rows in one call, use writer.writerows([...]):
rows = [
["login_test", "PASS", 1250],
["search_test", "FAIL", 3400],
]
writer.writerows(rows)Writing rows from dicts — csv.DictWriter (the one you'll use)
When your in-memory data is dicts (results from tests, parsed JSON, etc.), DictWriter is the cleanest fit:
import csv
results = [
{"name": "login_test", "status": "PASS", "duration_ms": 1250},
{"name": "search_test", "status": "FAIL", "duration_ms": 3400},
{"name": "logout_test", "status": "PASS", "duration_ms": 410},
]
with open("results.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["name", "status", "duration_ms"])
writer.writeheader()
writer.writerows(results)Three steps:
- Pass
fieldnames=— the column order in the output. DictWriter reads each dict and writes those keys in that order. - Call
writeheader()once to emit the header row. writerow(d)orwriterows([...])for the data.
A dict with extra keys raises ValueError by default; pass extrasaction="ignore" to skip them silently. A dict missing a key gets an empty cell unless you also pass restval=....
The Windows newline gotcha
You'll see this line on every CSV tutorial:
open("results.csv", "w", newline="", encoding="utf-8")The newline="" matters on Windows. By default, Python's text writer translates \n to \r\n. The csv writer also emits \r\n between rows. Combine the two and you get \r\r\n — which Excel and other readers display as blank rows between records. Pass newline="" and the doubling stops.
Tip: keep this in muscle memory. Even on macOS or Linux where you'd never hit the bug, including
newline=""makes the script behave the same when a teammate runs it on Windows.
A QA example — read inputs, run, write outputs
A complete data-driven test loop: read test_data.csv, "run" each row, write results.csv.
import csv
from pathlib import Path
def run_test(user: dict) -> dict:
"""Pretend to run a test for a user. Returns a result row."""
duration = len(user["name"]) * 120
status = "PASS" if user["role"] != "viewer" else "FAIL"
return {
"name": f"login_as_{user['name']}",
"status": status,
"duration_ms": duration,
"tested_role": user["role"]
}
input_path = Path("fixtures/test_data.csv")
output_path = Path("output/results.csv")
output_path.parent.mkdir(parents=True, exist_ok=True)
with input_path.open("r", newline="", encoding="utf-8") as f:
users = list(csv.DictReader(f))
results = [run_test(u) for u in users]
with output_path.open("w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["name", "status", "duration_ms", "tested_role"])
writer.writeheader()
writer.writerows(results)
print(f"Ran {len(users)} tests; wrote {output_path}")That's the entire pattern: DictReader in, list comprehension to compute results, DictWriter out. The same structure scales from three rows to thirty thousand.
A CSV mapped to its DictReader output
CSV rows ↔ DictReader dicts
| name | role | ||
|---|---|---|---|
| header (row 0) | 'name' — column key | 'email' — column key | 'role' — column key |
| row 1 | row['name'] = 'Alice' | row['email'] = 'alice@test.com' | row['role'] = 'admin' |
| row 2 | row['name'] = 'Bob' | row['email'] = 'bob@test.com' | row['role'] = 'tester' |
| row 3 | row['name'] = 'Carol' | row['email'] = 'carol@test.com' | row['role'] = 'viewer' |
The header row supplies the keys; every subsequent row becomes a dict mapping those keys to that row's strings. Add a new column to the CSV and your code keeps working — that's the win over indexing rows by position.
Other CSV details worth knowing
A few that come up but don't deserve their own section:
- Different delimiters.
csv.reader(f, delimiter=";")ordelimiter="\t"for semicolon- or tab-separated files. European Excel exports often use;. - Comment lines. Strip them yourself before feeding to the reader:
lines = [l for l in f if not l.startswith("#")], thencsv.reader(lines). - Skip the header without DictReader.
next(reader)eats one row. - For real spreadsheets (xlsx) Python ships nothing built-in. Install
openpyxl(pip install openpyxl). CSV is usually enough for QA fixtures.
⚠️ Common mistakes
- Splitting on commas manually.
for line in f: cells = line.strip().split(",")looks fine until a CSV cell contains a comma inside quotes —"Smith, John"becomes two cells. Thecsvmodule handles quoting for you. Use it. - Forgetting
newline="". On Windows you'll get blank rows between every record. The fix is one extra argument when opening the file:open(path, "w", newline=""). - Forgetting to convert numbers.
row["duration_ms"]from a DictReader is a string —"1250", not1250. Comparisons and arithmetic produce wrong answers without an explicitint(row["duration_ms"]). Convert at the boundary.
🎯 Practice task
Run a small data-driven test loop. 25-30 minutes.
- Create
fixtures/test_data.csvwith at least five rows. Columns:name,email,role(mix ofadmin,tester,viewer). - Create
csv_runner.py. Importcsvandpathlib.Path. - Read the CSV with
csv.DictReaderintousers = list(...). - Print
len(users)and the first row'snameandrole. - Define
def run_test(user: dict) -> dict:returning a result dict with at leastname,status,duration_ms. Pick the status based on the user's role (e.g. viewers fail, others pass — totally up to you). - Build
results = [run_test(u) for u in users]and print the count. - Write
output/results.csvwithcsv.DictWriter. Usefieldnames=["name", "status", "duration_ms"]. Don't forgetnewline="". - Reopen the output file with DictReader to verify it reads back cleanly. Print each row.
- Stretch 1: mid-loop, convert one column type — read
row["duration_ms"]asint(row["duration_ms"])once you've written results, and compute the average duration across all results. Print it. - Stretch 2: add a
passed_onlyoutput file. Use[r for r in results if r["status"] == "PASS"]and write that filtered list tooutput/passed.csv.
You can now read and write the format every QA spreadsheet uses. The next lesson leaves the local disk and reaches across the network — making HTTP requests with the requests library.