Reading Test Data from JSON and CSV

Excel earns its place when non-developers own the test data. For everything else — and that's most cases — JSON or CSV is the right choice. Both are text formats: Git diffs them properly, code review can annotate them, parsing is fast, and the tools are standard. JSON wins for nested or typed data. CSV wins for flat tabular data that anyone can edit. This lesson builds the same login-data suite from the previous lesson in both formats, shows the Jackson/CSV-reader plumbing, and gives you the decision rubric for picking the right format on a real project.

JSON test data — the typed answer

JSON pairs naturally with Jackson — the same library you used in Core Java for QA's exception-handling lessons:

src/test/resources/testdata/login-scenarios.json:

[
  { "username": "standard_user", "password": "secret_sauce", "shouldSucceed": true, "expectedFragment": "/inventory.html" },
  { "username": "locked_out_user", "password": "secret_sauce", "shouldSucceed": false, "expectedFragment": "locked out" },
  { "username": "problem_user", "password": "secret_sauce", "shouldSucceed": true, "expectedFragment": "/inventory.html" },
  { "username": "performance_glitch_user", "password": "secret_sauce", "shouldSucceed": true, "expectedFragment": "/inventory.html" },
  { "username": "", "password": "secret_sauce", "shouldSucceed": false, "expectedFragment": "Username is required" },
  { "username": "standard_user", "password": "", "shouldSucceed": false, "expectedFragment": "Password is required" }
]

Add Jackson to pom.xml if it's not already there:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.17.2</version>
</dependency>

A typed POJO that mirrors each row:

package com.mycompany.tests.data;
 
public class LoginScenario {
    public String username;
    public String password;
    public boolean shouldSucceed;
    public String expectedFragment;
}

The @DataProvider reads the file once and returns an array of typed scenarios:

import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
 
@DataProvider(name = "jsonLoginData")
public Object[][] jsonLoginData() throws IOException {
    LoginScenario[] data = new ObjectMapper().readValue(
        new File("src/test/resources/testdata/login-scenarios.json"),
        LoginScenario[].class
    );
    return Arrays.stream(data)
        .map(s -> new Object[]{ s })          // wrap each scenario as a single argument
        .toArray(Object[][]::new);
}
 
@Test(dataProvider = "jsonLoginData")
public void shouldHandleLoginScenario(LoginScenario s) {
    LoginPage login = new LoginPage(driver).navigateTo();
    login.loginAs(s.username, s.password);
 
    if (s.shouldSucceed) {
        Assert.assertTrue(driver.getCurrentUrl().contains(s.expectedFragment));
    } else {
        Assert.assertTrue(login.errorText().contains(s.expectedFragment));
    }
}

A single typed parameter per test invocation — no (String) row[0] casting, full IDE autocomplete on the POJO fields. Adding a field is a one-line POJO edit and a JSON edit; tests pick up the change without any @DataProvider change.

CSV test data — the simplest answer

CSV is the format every spreadsheet, every database, every analytics tool can export. For flat tabular data, it's hard to beat:

src/test/resources/testdata/login-scenarios.csv:

username,password,shouldSucceed,expectedFragment
standard_user,secret_sauce,true,/inventory.html
locked_out_user,secret_sauce,false,locked out
problem_user,secret_sauce,true,/inventory.html
performance_glitch_user,secret_sauce,true,/inventory.html
,secret_sauce,false,Username is required
standard_user,,false,Password is required

The simplest reader uses no extra dependency:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
 
public class CsvReader {
    public static Object[][] read(String path) throws IOException {
        List<String[]> rows = new ArrayList<>();
        try (BufferedReader br = new BufferedReader(new FileReader(path))) {
            String header = br.readLine();   // skip header — could parse to validate columns
            String line;
            while ((line = br.readLine()) != null) {
                rows.add(line.split(",", -1)); // -1 keeps trailing empties
            }
        }
        return rows.toArray(new Object[0][]);
    }
}

The hand-rolled split(",") is fine for clean data — no embedded commas, no quoted fields. For real-world CSVs that may contain quoted strings with commas inside, use a library:

<dependency>
    <groupId>com.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>5.9</version>
    <scope>test</scope>
</dependency>

import com.opencsv.CSVReader;
 
try (CSVReader reader = new CSVReader(new FileReader(path))) {
    List<String[]> all = reader.readAll();
    all.remove(0);   // drop header
    return all.toArray(new Object[0][]);
}

OpenCSV handles quoted fields, escaped quotes, multi-line values — the messy reality. For test data you control, the hand-rolled version is enough; for data exported from spreadsheets that users edit, prefer OpenCSV.

Three formats, side by side

Same test data — three formats compared

JSON

Typed POJOs via Jackson
Supports nested objects + arrays
Full IDE autocomplete on fields
Diffs cleanly in Git
Slightly verbose for flat data
Default for structured / nested test data

CSV

Flat tabular, header + rows
Smallest file size
Diffs cleanly in Git
Edit in any text editor or spreadsheet
No nesting, no types — everything is String
Default for flat tabular test data

Excel (.xlsx)

Familiar UI for non-developers
Native types: number, boolean, date
Formulas and formatting supported
Binary zip — Git can't diff
Larger files, slower parse
Reach for it when stakeholders own the data

The decision rubric

Three questions, in order:

Will non-developers regularly edit this data? → Excel.
Does the data have nesting (lists, objects, repeated structures)? → JSON.
Otherwise (flat tabular, edited by engineers)? → CSV.

Most projects converge on a mix: JSON for typed scenario data, CSV for large bulk datasets (10,000-row search-result fixtures), Excel for stakeholder-owned matrices. Pick deliberately per file, not religiously per project.

Nested data — JSON's sweet spot

The case where JSON dominates: data with structure beyond rows-and-columns.

{
  "users": [
    {
      "username": "standard_user",
      "password": "secret_sauce",
      "permissions": ["view", "edit"],
      "preferences": { "theme": "dark", "timezone": "UTC" }
    }
  ]
}

A POJO with nested types reads this directly:

public class Preferences { public String theme; public String timezone; }
 
public class TestUser {
    public String username;
    public String password;
    public List<String> permissions;
    public Preferences preferences;
}

Try expressing that in CSV. You can't — without flattening into multiple files or inventing a custom delimiter syntax. JSON is the right tool when the data isn't flat.

A complete JSON-driven test

package com.mycompany.tests.tests;
 
import com.fasterxml.jackson.databind.ObjectMapper;
import com.mycompany.tests.base.BaseTest;
import com.mycompany.tests.data.LoginScenario;
import com.mycompany.tests.pages.LoginPage;
import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
 
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
 
public class JsonDrivenLoginTest extends BaseTest {
 
    @DataProvider(name = "scenarios")
    public Object[][] scenarios() throws IOException {
        LoginScenario[] data = new ObjectMapper().readValue(
            new File("src/test/resources/testdata/login-scenarios.json"),
            LoginScenario[].class
        );
        return Arrays.stream(data).map(s -> new Object[]{ s }).toArray(Object[][]::new);
    }
 
    @Test(dataProvider = "scenarios")
    public void shouldHandleLoginScenario(LoginScenario s) {
        LoginPage login = new LoginPage(driver).navigateTo();
        login.loginAs(s.username, s.password);
 
        if (s.shouldSucceed) {
            Assert.assertTrue(driver.getCurrentUrl().contains(s.expectedFragment),
                "Expected URL to contain " + s.expectedFragment);
        } else {
            Assert.assertTrue(login.errorText().contains(s.expectedFragment),
                "Expected error to mention " + s.expectedFragment);
        }
    }
}

Same test logic as the Excel and CSV versions; only the data source changes.

Comparison with Cypress and Playwright

// Cypress — fixtures are first-class
cy.fixture("login-scenarios.json").then((data) => {
  data.forEach((s) => {
    it(`logs in: ${s.username}`, () => { ... });
  });
});
 
// Playwright — read JSON natively in test setup
import data from "./login-scenarios.json";
data.forEach((s) => {
  test(`logs in: ${s.username}`, async ({ page }) => { ... });
});

Both modern frameworks handle JSON natively because they're already in JavaScript-land. Selenium with Java needs Jackson (one extra dependency) but the experience is otherwise identical — typed scenarios, IDE autocomplete, fast parse. The slight setup cost is paid back many times over once the suite grows.

The Selenium tool entry and the TestNG cheat sheet cover the test side.

⚠️ Common mistakes

Hand-rolling CSV parsing on data that may have quoted fields. line.split(",") breaks on "Smith, John",42,true — three columns, but split returns four. Either keep your CSV strictly comma-free, or use OpenCSV. Don't trust hand-rolled CSV with arbitrary user-provided data.
Using JSON for huge flat datasets. A 10,000-row JSON file is ~3× larger than the equivalent CSV and slower to parse. For bulk fixtures (search results, product catalogues), CSV is the better fit. JSON's structural advantages don't help when the structure is "a long list of identical-shape objects."
Storing test secrets in committed test-data files. A real password, an API token, a personal email — even in test data — should not live in the repo. Read secrets from environment variables or a secret manager; let the data file reference them by name.

🎯 Practice task

Build the same suite in JSON and CSV. 30–40 minutes.

Create src/test/resources/testdata/login-scenarios.json with the seven rows from this lesson. Build LoginScenario.java and JsonDrivenLoginTest. Run it; seven test results.
Create src/test/resources/testdata/login-scenarios.csv with the same data. Add CsvReader.java and a CsvDrivenLoginTest. Run it; seven test results, identical assertions.
Compare the three implementations. You now have Excel, JSON, and CSV versions of the same data and the same test. Diff them — the test logic is shared (~15 lines), only the data-loading layer differs. Decide which feels best for your context.
Add OpenCSV. Switch your CSV reader to use com.opencsv.CSVReader. Add a row with a quoted field containing a comma: "Smith, John",42,true,welcome. Confirm OpenCSV handles it correctly while the hand-rolled split(",") does not.
Nested JSON. Build a richer TestUser.json with a list of permissions and a nested preferences object. Write a TestUser POJO with List<String> permissions and a nested Preferences class. Use it in a test — Jackson maps the structure automatically. Note that the same shape in CSV would require either flattening or multiple files.
Stretch — externalise environment URLs. Create src/test/resources/environments.json with { "staging": { "baseUrl": "https://staging..." }, "production": { "baseUrl": "..." } }. Read it in BaseTest based on System.getProperty("env", "staging"). Run mvn test -Denv=production to swap targets. This is the foundation of every multi-environment test suite.

Next lesson: cross-browser. We'll plug WebDriverManager (and Selenium Manager) into a parameter-driven BaseTest, then run the same tests against Chrome, Firefox, and Edge in parallel via TestNG.