Test Sharding Across Multiple Machines

9 min read

In-process parallelism runs your tests faster on one machine. Sharding runs your tests faster across many machines. When a single runner can't give you enough parallelism — because your test suite is large, your tests are memory-hungry browser tests, or you've already maxed out the CPU — sharding is the next lever to pull. A 200-test E2E suite that takes 40 minutes on one runner takes 10 minutes split across four.

What sharding is

Sharding divides a test suite into N equally-sized chunks called shards. Each shard runs on a separate CI machine independently. Machines run in parallel; when all shards complete, their results are merged into a single report.

The key difference from level 2 parallel jobs (different test types in different jobs) is that sharding splits a single homogeneous suite — the same test type, the same framework, just different tests — across multiple runners. It's the right tool when your E2E suite is too large to run in one job within an acceptable time.

Playwright sharding

Playwright has built-in shard support. --shard=M/N tells Playwright to run the M-th chunk of N total chunks:

# .github/workflows/playwright-sharded.yml
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]
 
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npx playwright install --with-deps chromium
 
      - name: Run shard ${{ matrix.shard }} of 4
        run: npx playwright test --shard=${{ matrix.shard }}/4 --reporter=blob
 
      - name: Upload blob report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: blob-report-${{ matrix.shard }}
          path: blob-report/
          retention-days: 1

--reporter=blob produces a compact binary format designed for merging. Don't use --reporter=html on sharded runs — each shard generates a partial HTML report that can't be combined.

After all four shards complete, merge their results into one report:

  merge-reports:
    runs-on: ubuntu-latest
    needs: test
    if: always()
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
 
      - uses: actions/download-artifact@v4
        with:
          pattern: blob-report-*
          path: all-blob-reports
          merge-multiple: true
 
      - run: npx playwright merge-reports --reporter html ./all-blob-reports
 
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: playwright-report
          path: playwright-report/
          retention-days: 14

The merge job downloads all four blob reports and combines them into a single interactive HTML dashboard. You get one report covering all 200 tests, regardless of which shard ran each one.

TestNG sharding (manual approach)

TestNG doesn't have a built-in shard flag, so you split suites manually:

<!-- testng-shard-1.xml -->
<suite name="Shard 1" parallel="methods" thread-count="2">
    <test name="Shard 1 Tests">
        <classes>
            <class name="com.example.tests.CheckoutTests"/>
            <class name="com.example.tests.LoginTests"/>
            <class name="com.example.tests.SearchTests"/>
        </classes>
    </test>
</suite>
<!-- testng-shard-2.xml -->
<suite name="Shard 2" parallel="methods" thread-count="2">
    <test name="Shard 2 Tests">
        <classes>
            <class name="com.example.tests.ProductTests"/>
            <class name="com.example.tests.CartTests"/>
            <class name="com.example.tests.PaymentTests"/>
        </classes>
    </test>
</suite>

Then run each suite in a parallel matrix job:

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-java@v4
        with: { java-version: '21', distribution: 'temurin', cache: 'maven' }
      - run: mvn test -DsuiteFile=testng-shard-${{ matrix.shard }}.xml -Dheadless=true -B
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: surefire-reports-shard-${{ matrix.shard }}
          path: target/surefire-reports/
          retention-days: 7

The downside: maintaining multiple suite XML files is manual work. When you add a new test class, you decide which shard it belongs to. Tools like maven-surefire-plugin's forkCount option and reuseForks can automate some of this, but for most teams, maintaining 3–4 explicit suite files is a reasonable trade-off for the speed gain.

Cypress sharding

Cypress Cloud (the paid service) handles sharding automatically — it distributes test files across multiple machines and collects results. Without it, use the cypress-parallel npm package:

npm install --save-dev cypress-parallel
- run: npx cypress-parallel -s cy:run -t 4 -d cypress/e2e -a '"--browser chrome"'

-t 4 creates four parallel threads. The package splits test files across threads and aggregates results. It's not as seamless as Playwright's native sharding but achieves the same effect.

Balancing shards

Simple index-based sharding assigns tests to shards in file order. This works but can produce unbalanced shards — one shard gets all the slow tests, others finish in minutes and wait. The pipeline time is determined by the slowest shard.

To diagnose imbalance: in Playwright, --reporter=json produces timing data for every test. Sort by duration and check whether slow tests cluster in one shard. To fix: manually reorder tests so each shard has a mix of fast and slow, or use Playwright's --reporter=list timing output to guide manual rebalancing.

Step 1 of 6

Workflow triggers

A PR is opened. GitHub Actions provisions 4 separate ubuntu-latest runners simultaneously — one per matrix shard.

Cost vs speed trade-off

Each shard is one CI runner, and runners cost (in GitHub Actions minutes or in infrastructure). Four shards use 4× the total compute time — 40 minutes of test execution on 4 runners costs 160 runner-minutes, the same as running serially.

The saving is in developer waiting time and feedback speed, not in compute cost. Whether the cost is justified depends on your team's velocity. On a team that opens 10+ PRs per day, cutting pipeline time from 40 minutes to 10 minutes is worth the compute cost. On a team with one PR per day, serial execution is fine.

⚠️ Common mistakes

  • Using --reporter=html on sharded Playwright runs. Each shard generates a partial HTML report that only covers its own tests. Running multiple shards with the HTML reporter gives you 4 separate partial reports. Use --reporter=blob for shards and merge at the end.
  • Not setting fail-fast: false on the shard matrix. If shard 2 fails, GitHub cancels shards 3 and 4 by default. You lose coverage from those shards and the merge step may run with incomplete data. Always set fail-fast: false on shard matrices.
  • Forgetting the merge step. Without the merge job, you have four separate blob artifacts and no usable HTML report. The merge step is not optional — it's what turns four partial results into one actionable report.

🎯 Practice task

Set up a 3-shard Playwright or TestNG run — 40 minutes.

  1. Playwright: add --shard=${{ matrix.shard }}/3 and --reporter=blob to your existing workflow. Add the merge job from this lesson. Run. Confirm three parallel jobs appear in the Actions tab and the merge job produces a single HTML report.
  2. TestNG: split your test classes across three testng-shard-N.xml files. Add a matrix job referencing the shard number. Run. Confirm each shard uploads a separate Surefire artifact.
  3. Time the sharded run vs a serial run. Note the actual speedup — it's usually 70–80% of the theoretical maximum due to setup overhead.
  4. Stretch: add timing data to identify your slowest tests. In Playwright: look for tests over 30 seconds in the merged HTML report. In TestNG: check the Surefire XML time attribute. Move the three slowest tests to different shards.

The next lesson addresses a problem that makes sharding much faster in practice: caching the dependencies that every shard currently re-downloads from scratch.

// tip to track lessons you complete and pick up where you left off across devices.