On this page7 sections

Feature flags / experimentation tools

Feature flag tools let you turn features on and off in production without redeploying — so you can release code dark, roll it out gradually, test it on a slice of real users, and kill it instantly if something breaks. For QA, flags are both a powerful safety mechanism and a new testing surface: every flag multiplies the states your software can be in.

// WHAT THEY ARE

A feature flag (or toggle) is a switch in your code that decides whether a feature is active, evaluated at runtime instead of baked in at deploy. A feature flag platform manages those switches centrally: who sees what, gradual rollout percentages, targeting rules, and an audit trail — so flipping a feature doesn't require a code change or a deploy.

The defining idea is decoupling deployment from release. Code ships to production behind an off flag (a "dark" launch); releasing it is a separate, controlled act — flip it on for internal users, then a canary slice, then a percentage, then everyone, watching metrics at each ring. If anything goes wrong, you flip it back off in seconds rather than rolling back a deploy. Beyond release control, the same machinery powers experimentation — A/B tests that route variants to user segments and measure the difference — which is why these tools straddle engineering and product.

For QA, that power comes with a cost: flags multiply state. A feature behind a flag has at least two code paths to test; several flags interact combinatorially. Validating flag behavior, targeting rules, and the on/off (and rollback) paths becomes part of the test strategy.

// WHEN YOU NEED THEM

You reach for flags when releasing a change is risky and you want to limit blast radius: a big feature you want to ship incrementally, a change you want to validate on real traffic before full rollout, or anything you'd want to disable instantly without an emergency deploy. They're also how teams practice trunk-based development — merging unfinished work behind a flag rather than holding long-lived branches — and how product teams run experiments.

// The signals

  • Wanting to decouple "the code is deployed" from "users can see it"
  • Needing a kill switch for risky changes
  • Rolling out gradually (canary/percentage) and watching metrics
  • Running A/B experiments
  • Merging to main continuously without exposing half-built features

// COMPARISON

ToolTypeStrengthBest for
LaunchDarklyCommercialDetailed targeting, fast propagationEnterprise default; the mature standard
UnleashOpen source (+ paid)Progressive delivery, self-hostTeams wanting OSS with enterprise features
GrowthBookOpen source (+ paid)Experimentation + flags, modularExperiment-led teams; local evaluation
FlagsmithOpen source (+ paid)Flexible deploy, on-prem optionTeams needing self-hosted control
FliptOpen sourceGit-native, lightweightGitOps-style flag management

// OPEN SOURCE VS PAID

This space has a strong open-source tier and an enterprise leader. LaunchDarkly is the commercial default — deep targeting, fast flag propagation, enterprise governance — and the tool many teams start with, until a renewal prompts the "could we self-host this?" question. The open-source contenders have matured into serious answers: Unleash, GrowthBook, Flagsmith, and Flipt all self-host, each with a different lean — Unleash on progressive delivery, GrowthBook on experimentation, Flagsmith on flexible/on-prem deployment, Flipt on a lightweight Git-native model. Most offer a paid hosted tier on top of the free core. Worth knowing: OpenFeature (CNCF) is the vendor-neutral SDK standard — instrument against it and you can switch providers without rewriting your flag code, the same lock-in hedge OpenTelemetry is for observability. For learners and small teams, a self-hosted Unleash or GrowthBook, or env-var flags graduating to a platform via OpenFeature, covers the ground at no licence cost.

// HOW TO CHOOSE

  1. 01Flags, experiments, or both? Pure release control and kill switches → Unleash, Flagsmith, LaunchDarkly. Heavy A/B testing and metrics → GrowthBook or a platform with strong experimentation. Many teams need both, but the emphasis shifts the choice.
  2. 02Self-host or managed? If data control or cost is the driver, the open-source platforms (Unleash, GrowthBook, Flagsmith, Flipt) self-host. If you want it handled and have budget, LaunchDarkly is the polished managed option.
  3. 03Instrument vendor-neutral. Whatever you pick, evaluate flags through OpenFeature where you can — it keeps you from rewriting flag code if you switch providers later.
  4. 04How big is the team? Small teams (≤15) do well on Flagsmith/GrowthBook with light governance; mid-size evaluate LaunchDarkly/Statsig with proper RBAC and pipeline checks; large orgs need a governance framework and lifecycle automation, not just a tool.
  5. 05Plan for flag lifecycle from day one. The tool is the easy part; the discipline — naming, ownership, sunset policies — is what determines whether flags help or rot. Pick a tool whose workflow supports retiring flags, and use it.

// COMMON MISTAKES

  • Not testing the off path (and the rollback). A flag has at least two states; teams test the new feature on and forget the experience with it off — which is exactly the state they'll fall back to in an incident. Test both, and test the flip.
  • Combinatorial flag explosion left untested. Several interacting flags create many combinations. You can't test all of them — but teams often test none of the interactions, then get surprised when two flags conflict in production.
  • Flag debt. Flags left in the code long after a feature is fully rolled out become permanent complexity — dead branches, confusing logic, risk. Without sunset policies, flag debt compounds. Retire flags once they've served their purpose.
  • Treating "behind a flag" as "tested." Shipping code dark is not the same as validating it. A flag controls exposure, not quality — the feature still needs testing before you flip it on for users.
  • No governance on who can flip what. A flag that anyone can toggle in production is an incident waiting to happen. Targeting and kill switches need access control and an audit trail, especially at scale.