Common BDD Pitfalls and How to Avoid Them

8 min read

BDD is one of the most misapplied practices in software testing. Teams adopt Cucumber, write feature files, wire up step definitions, and declare they "do BDD" — while missing the core purpose entirely. The result is Cucumber-flavoured test automation: the same maintenance burden as a Selenium suite, but with extra layers. This lesson names the patterns that cause that outcome and how to fix them.

Pitfall 1: Treating BDD as a testing tool

What it looks like: the QA team discovers Cucumber, adopts it, writes all the Gherkin, and the developers implement step definitions. The product owner has never read a feature file.

Why it happens: Cucumber is a test automation tool. It's easy to adopt as a QA tool without ever using it as a collaboration tool.

What's lost: the entire specification benefit. Edge cases are still discovered late. Requirements are still misunderstood. You've added Gherkin syntax overhead to a Selenium framework without gaining shared understanding.

The fix: BDD must involve the business from day one. Feature files that no product owner has ever reviewed are Gherkin-wrapped test scripts, not behaviour specifications. If stakeholders can't (or won't) engage with Gherkin, BDD is the wrong choice for your team — use a plain Selenium framework instead of a Cucumber wrapper that nobody benefits from.

Pitfall 2: Technical Gherkin

What it looks like:

# Bad
When the user clicks element "#login-btn"
And the response status code is 200
And the database record with id 42 has status = "active"

Why it happens: step definitions contain Selenium or SQL calls, and the Gherkin drifts toward describing those calls.

The fix: feature files describe behaviour, step definitions describe implementation. If a CSS selector, status code, database column name, or API endpoint appears in a Gherkin step, it belongs in the step definition instead. Ask: "could a product owner read this step and confirm it matches their understanding?" If not, it's too technical.

Pitfall 3: UI-step addiction

What it looks like:

# Bad — scripting every interaction
When the user moves the mouse to the "Email" field
And the user clicks the "Email" field
And the user types "alice@test.com"
And the user presses Tab
And the user types "password123"
And the user clicks the button with text "Sign In"

Why it happens: testers transcribe manual test steps directly into Gherkin.

The fix: one business-level step replaces a chain of UI interactions:

When the user logs in as "alice@test.com"

The page object contains all the clicks and keystrokes. The feature file contains the intent.

Pitfall 4: No Three Amigos

What it looks like: Gherkin is written by the QA engineer after the sprint ends, based on what the code does, not on what the requirement said.

Why it happens: Three Amigos meetings add upfront time. Teams under deadline pressure skip them.

What's lost: the collaboration that prevents bugs. You end up with Gherkin that passes because it was written to match the implementation — not because the implementation was verified against a specification.

The fix: Three Amigos before every complex story. Even a 15-minute conversation at the start of a sprint prevents requirements misunderstandings that cause a week of rework.

Pitfall 5: One massive step definition file

What it looks like: StepDefinitions.java at 800+ lines containing every step for every feature.

Why it happens: the project starts small, step definitions accumulate in one file, nobody refactors.

The fix: organise step definitions by feature domain:

stepdefinitions/
├── AuthSteps.java        ← login, logout, registration
├── ProductSteps.java     ← search, filter, product detail
├── CartSteps.java        ← add, remove, update quantities
├── CheckoutSteps.java    ← payment, address, order confirmation
└── AccountSteps.java     ← profile, settings, order history

With PicoContainer, all these classes share TestContext — there's no coordination cost to splitting.

Pitfall 6: Scenario count explosion

What it looks like: a single feature has 150 Gherkin scenarios, each covering a slightly different input combination.

Why it happens: testers apply the same exhaustive combinatorial coverage mindset they use for unit tests.

The fix: Gherkin scenarios test behaviour, not every input permutation. Unit tests (JUnit/TestNG) handle exhaustive input validation. A Gherkin scenario for "login with empty email" covers the boundary; you don't need 30 variations of malformed email addresses in Gherkin. Use Scenario Outline for legitimate data variations; use unit tests for boundary testing of validation logic.

Pitfall 7: Neglecting step reuse

What it looks like: login steps duplicated across 6 feature files, each with slightly different wording. A change to the login page requires updating 6 step definition methods.

Why it happens: new step definitions are written without checking whether a matching one already exists.

The fix: treat step definitions like a shared library. Before writing a new step, search the glue packages:

grep -r "the user is on the login page" src/test/java/stepdefinitions/

If a step exists, reuse it. If it's close but not quite right, generalise it with a {string} parameter. Step definition maintenance cost should decrease as the suite grows — not increase.

Pitfall 8: Ignoring conjunction steps

What it looks like:

When the user fills in the email and clicks submit and waits for the confirmation

And has become a conjunction within a step, not a Gherkin continuation. This step can only map to one step definition method, which now does three things.

The fix: split into separate steps. Each Gherkin step does one thing. The step definition for that step does one thing. Readability and debuggability both improve.

How to do BDD right

BDD done right
  • – Three Amigos before every complex story
  • – Stakeholders review feature files
  • – Gherkin owned by the whole team
  • – Declarative — describe behaviour, not clicks
  • – No technical details (selectors, URLs, SQL)
  • – Short scenarios — 3 to 7 steps
  • – Reuse steps across features
  • – PicoContainer DI — no static state
  • – Step definitions delegate to page objects
  • – Steps organised by feature domain
  • Reports published after every CI run –
  • Feature files are the specification –
  • Updated with requirements, not after –

The recovery path

If your project has several of these pitfalls, you don't need to rewrite everything. Fix them incrementally:

  1. Start with collaboration: hold one Three Amigos session for the next new story. See whether the quality of the scenarios improves.
  2. Clean one feature file: pick one .feature file and refactor every imperative step to declarative. Run the tests to confirm green.
  3. Split the step definition file: extract one feature domain into its own class. Confirm PicoContainer still injects TestContext correctly.
  4. Publish the report: host the Cucumber HTML report somewhere a stakeholder can access it. Share the link in the next sprint review.

Each of these takes an hour. None requires rewriting the entire suite. BDD is a practice that improves incrementally — small steps, consistently applied.

⚠️ Common mistakes

  • Trying to fix everything at once. "Our BDD is broken, let's refactor all 200 scenarios this sprint" fails because there's no time, the team gets frustrated, and the pitfalls return. Fix one thing per sprint.
  • Not getting buy-in from the product team. If product owners still write requirements in Confluence and QA translates them to Gherkin, the collaboration gap remains open. The fix is organisational, not technical.
  • Measuring BDD success by scenario count. "We have 300 Gherkin scenarios!" is not a BDD success metric. The metric is: how many requirements misunderstandings were caught before development? How quickly do stakeholders get answers about system behaviour?

🎯 Practice task

Audit your own project for pitfalls. 30 minutes.

  1. Read through every feature file you've written in this course. Mark each step that contains a technical detail, a UI element ID, or implementation information.
  2. Count how many step definition methods are unique to one scenario (not reused). What percentage of your step library is shared?
  3. Identify the longest scenario in your suite. Count the steps. Could it be split into two shorter scenarios without losing test value?
  4. Stretch: pick your worst feature file — the one with the most pitfalls — and refactor it completely. Declarative steps, no technical details, Background for shared setup, reused steps. Run the suite before and after. Confirm green. Compare the readability.

This completes Chapter 5. The final chapter is the capstone project: building a full BDD test suite for an online banking application, combining everything from the course.

// tip to track lessons you complete and pick up where you left off across devices.