Edge case discovery with AI
Humans write the happy path and three or four obvious negatives. AI generates fifty boundary inputs you would never think of — null mid-string, RTL Arabic in an LTR form, leap-year date arithmetic, integer overflow at the API boundary. The work is then sorting useful adversarial inputs from noise. That sorting discipline is what separates a team that runs fifty AI-generated edge cases and finds three genuine defects from a team that runs fifty AI-generated edge cases and finds none, because the inputs never reached the application logic.
Six categories of edge case
Category-specific prompting distributes coverage across input-space dimensions AI handles well.
Six categories of edge case map well to LLM generation capabilities: boundary values, null and missing fields, unicode and encoding anomalies, locale-specific formats, race conditions and timing, and adversarial inputs. Each category represents a distinct failure mode that manual case design commonly misses — either because the tester does not think of it, or because generating exhaustive boundary values for a single field takes longer than the time available.
AI generates coverage across all six categories in seconds. The quality is uneven — adversarial inputs tend to be more creative than necessary, boundary values tend to be more thorough than a human would produce — but the overall set represents a coverage floor that manual case design rarely achieves in comparable time.
How to ask for them
Category-specified prompts produce distributed coverage; unspecified prompts produce a single bucket.
The quality difference between a naive edge-case prompt and a category-specified prompt is substantial. A naive prompt — "give me edge cases for this field" — produces a list that clusters around the most obvious boundaries and ignores the locale, unicode, and race-condition categories entirely. The LLM treats "edge case" as a single concept and optimises for surface coverage.
Specifying categories explicitly distributes generation across the full input space. The model generates against each category label rather than guessing what you consider an edge case, and the resulting set covers dimensions that a naive prompt would miss entirely. The prompt pattern below produces a set that exercises all six categories with a prescribed distribution.
# Edge case generation — category-specified prompt # Field: email VARCHAR(255), required, must be a valid RFC 5321 address Give me 20 test inputs for this email field. Distribute them across these specific categories: Boundary (5 inputs): empty string, max-length (255 chars), exactly valid minimal address (a@b.io), near-boundary lengths Null & missing (3 inputs): null, undefined, whitespace-only string Unicode & encoding (5 inputs): non-ASCII local part, RTL script, emoji in local part, combining diacritics, surrogate pair Locale (4 inputs): Japanese full-width chars, German umlaut, French accents, Arabic characters Adversarial (3 inputs): SQL injection fragment, XSS payload, SMTP injection via newlines Format: one input per line, category label as prefix. Include expected validation result (valid/invalid). No commentary.
The signal-to-noise problem
Sixty to eighty percent useful — treat AI-generated edge cases as a starting set, not a final test suite.
A typical AI-generated edge-case set for a complex field contains roughly fifty inputs. Around thirty will be genuine test candidates that reach your application logic. Approximately fifteen will be invalid before they reach any code — the field validation rejects them at the boundary. Five will be duplicates with different surface forms: the SQL injection attempt looks different from the XSS payload, but both exercise the same application-layer sanitisation path.
The two-pass review process handles this efficiently. Run the generated set through schema validation first — automated, takes seconds, discards the fifteen invalid inputs without human review. Then scan the remaining thirty-five for duplicates and clustering. The human review step is proportionate to the value it adds: evaluating thirty-five inputs for genuine coverage is a ten-minute task, not a morning.
// NOTE
Categories AI handles badly
Domain-specific rules, state machines, and performance boundaries require human case design.
AI generates input-space coverage well. It generates state-space coverage badly. The three categories where AI-generated edge cases are consistently low-quality are domain-specific business rules (the model does not know what your application does), multi-step state machines (the model cannot reason deeply about valid state transitions), and performance and infrastructure boundaries (the model has no knowledge of your infrastructure constraints or production load profiles).
For these categories, the pattern that works is AI-assisted case design rather than AI-generated cases. Describe the business rule or state machine to the model and ask it to identify which transitions are underspecified or ambiguous — then a human designs the test cases against those ambiguities. The model is useful as a sounding board for incomplete specifications; it is not useful as a generator of test cases for business logic it cannot observe.
// PRODUCTION
// Read more