Reading and Writing CSV Files

8 min read

CSV is the unglamorous workhorse of QA data. Test cases exported from a spreadsheet, fixtures generated by a backend team, run reports consumed by management dashboards — they're all CSV. Python's built-in csv module handles every awkward corner of the format (commas inside quoted fields, escaped quotes, trailing whitespace) so you don't have to roll your own parser. This lesson covers the four main shapes you'll meet: reading rows as lists, reading rows as dicts (the form you'll actually use), writing rows from lists, and writing rows from dicts.

What CSV looks like

A typical test data file:

name,email,role
Alice,alice@test.com,admin
Bob,bob@test.com,tester
Carol,carol@test.com,viewer

The first line is the header row — column names. Each subsequent line is a record. Cells are separated by commas; if a cell itself contains a comma, the cell is wrapped in double quotes. Naively splitting on commas breaks the moment a field contains one — which is why import csv exists.

Reading rows as lists — csv.reader

import csv
 
with open("test_data.csv", "r", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)            # consume the first row as headers
    print(header)                    # ['name', 'email', 'role']
    for row in reader:
        print(row)                   # each row is a list of strings

A few details:

  • newline="" — pass this when you open a CSV. The csv module handles its own line endings; if you let Python's default newline translation interfere, you can get blank rows on Windows.
  • csv.reader(f) — wraps the file object in a reader that yields one list per row. Every cell is a string, even if it looks like a number.
  • next(reader) — pulls the first row. We use it to peel off the header so the loop only sees data rows.

For small files this works, but you're indexing rows by position (row[0], row[1], …), which is fragile — add a column to the CSV and every index shifts.

Reading rows as dicts — csv.DictReader (the one you'll use)

DictReader reads the header automatically and yields each row as a dict keyed by column name. This is what you should reach for 95% of the time:

import csv
 
with open("test_data.csv", "r", newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(f"User: {row['name']:<10} Email: {row['email']}")

Output:

User: Alice      Email: alice@test.com
User: Bob        Email: bob@test.com
User: Carol      Email: carol@test.com

Every row is a dict like {"name": "Alice", "email": "alice@test.com", "role": "admin"}. Add a new column to the CSV and existing code keeps working — that resilience is why DictReader wins.

Reminder: every value DictReader hands you is a string. If a column holds numbers, convert with int(row["age"]) or float(row["price"]) before doing maths.

Loading the whole CSV into memory

For data-driven tests you usually want every row available before you start running:

import csv
 
with open("test_data.csv", "r", newline="", encoding="utf-8") as f:
    rows = list(csv.DictReader(f))
 
print(f"Loaded {len(rows)} rows")
print(rows[0])      # the first row as a dict

Wrapping a DictReader in list(...) materialises every row into a list of dicts. Fine for files up to tens of thousands of rows; if you have millions, iterate one row at a time as in the previous example.

Writing rows from lists — csv.writer

import csv
 
with open("results.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["test_name", "status", "duration_ms"])
    writer.writerow(["login_test",   "PASS", 1250])
    writer.writerow(["search_test",  "FAIL", 3400])
    writer.writerow(["logout_test",  "PASS",  410])

Open the resulting file:

test_name,status,duration_ms
login_test,PASS,1250
search_test,FAIL,3400
logout_test,PASS,410

csv.writer accepts any iterable as a row — Python converts each value to a string for you. Numbers go through unchanged; strings with commas are quoted automatically.

For a list of rows in one call, use writer.writerows([...]):

rows = [
    ["login_test",  "PASS", 1250],
    ["search_test", "FAIL", 3400],
]
writer.writerows(rows)

Writing rows from dicts — csv.DictWriter (the one you'll use)

When your in-memory data is dicts (results from tests, parsed JSON, etc.), DictWriter is the cleanest fit:

import csv
 
results = [
    {"name": "login_test",   "status": "PASS", "duration_ms": 1250},
    {"name": "search_test",  "status": "FAIL", "duration_ms": 3400},
    {"name": "logout_test",  "status": "PASS", "duration_ms":  410},
]
 
with open("results.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "status", "duration_ms"])
    writer.writeheader()
    writer.writerows(results)

Three steps:

  1. Pass fieldnames= — the column order in the output. DictWriter reads each dict and writes those keys in that order.
  2. Call writeheader() once to emit the header row.
  3. writerow(d) or writerows([...]) for the data.

A dict with extra keys raises ValueError by default; pass extrasaction="ignore" to skip them silently. A dict missing a key gets an empty cell unless you also pass restval=....

The Windows newline gotcha

You'll see this line on every CSV tutorial:

open("results.csv", "w", newline="", encoding="utf-8")

The newline="" matters on Windows. By default, Python's text writer translates \n to \r\n. The csv writer also emits \r\n between rows. Combine the two and you get \r\r\n — which Excel and other readers display as blank rows between records. Pass newline="" and the doubling stops.

Tip: keep this in muscle memory. Even on macOS or Linux where you'd never hit the bug, including newline="" makes the script behave the same when a teammate runs it on Windows.

A QA example — read inputs, run, write outputs

A complete data-driven test loop: read test_data.csv, "run" each row, write results.csv.

import csv
from pathlib import Path
 
def run_test(user: dict) -> dict:
    """Pretend to run a test for a user. Returns a result row."""
    duration = len(user["name"]) * 120
    status = "PASS" if user["role"] != "viewer" else "FAIL"
    return {
        "name":         f"login_as_{user['name']}",
        "status":       status,
        "duration_ms":  duration,
        "tested_role":  user["role"]
    }
 
input_path  = Path("fixtures/test_data.csv")
output_path = Path("output/results.csv")
output_path.parent.mkdir(parents=True, exist_ok=True)
 
with input_path.open("r", newline="", encoding="utf-8") as f:
    users = list(csv.DictReader(f))
 
results = [run_test(u) for u in users]
 
with output_path.open("w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "status", "duration_ms", "tested_role"])
    writer.writeheader()
    writer.writerows(results)
 
print(f"Ran {len(users)} tests; wrote {output_path}")

That's the entire pattern: DictReader in, list comprehension to compute results, DictWriter out. The same structure scales from three rows to thirty thousand.

A CSV mapped to its DictReader output

The header row supplies the keys; every subsequent row becomes a dict mapping those keys to that row's strings. Add a new column to the CSV and your code keeps working — that's the win over indexing rows by position.

Other CSV details worth knowing

A few that come up but don't deserve their own section:

  • Different delimiters. csv.reader(f, delimiter=";") or delimiter="\t" for semicolon- or tab-separated files. European Excel exports often use ;.
  • Comment lines. Strip them yourself before feeding to the reader: lines = [l for l in f if not l.startswith("#")], then csv.reader(lines).
  • Skip the header without DictReader. next(reader) eats one row.
  • For real spreadsheets (xlsx) Python ships nothing built-in. Install openpyxl (pip install openpyxl). CSV is usually enough for QA fixtures.

⚠️ Common mistakes

  • Splitting on commas manually. for line in f: cells = line.strip().split(",") looks fine until a CSV cell contains a comma inside quotes — "Smith, John" becomes two cells. The csv module handles quoting for you. Use it.
  • Forgetting newline="". On Windows you'll get blank rows between every record. The fix is one extra argument when opening the file: open(path, "w", newline="").
  • Forgetting to convert numbers. row["duration_ms"] from a DictReader is a string"1250", not 1250. Comparisons and arithmetic produce wrong answers without an explicit int(row["duration_ms"]). Convert at the boundary.

🎯 Practice task

Run a small data-driven test loop. 25-30 minutes.

  1. Create fixtures/test_data.csv with at least five rows. Columns: name, email, role (mix of admin, tester, viewer).
  2. Create csv_runner.py. Import csv and pathlib.Path.
  3. Read the CSV with csv.DictReader into users = list(...).
  4. Print len(users) and the first row's name and role.
  5. Define def run_test(user: dict) -> dict: returning a result dict with at least name, status, duration_ms. Pick the status based on the user's role (e.g. viewers fail, others pass — totally up to you).
  6. Build results = [run_test(u) for u in users] and print the count.
  7. Write output/results.csv with csv.DictWriter. Use fieldnames=["name", "status", "duration_ms"]. Don't forget newline="".
  8. Reopen the output file with DictReader to verify it reads back cleanly. Print each row.
  9. Stretch 1: mid-loop, convert one column type — read row["duration_ms"] as int(row["duration_ms"]) once you've written results, and compute the average duration across all results. Print it.
  10. Stretch 2: add a passed_only output file. Use [r for r in results if r["status"] == "PASS"] and write that filtered list to output/passed.csv.

You can now read and write the format every QA spreadsheet uses. The next lesson leaves the local disk and reaches across the network — making HTTP requests with the requests library.

// tip to track lessons you complete and pick up where you left off across devices.