Feature flags / experimentation tools

Feature flag tools let you turn features on and off in production without redeploying — so you can release code dark, roll it out gradually, test it on a slice of real users, and kill it instantly if something breaks. For QA, flags are both a powerful safety mechanism and a new testing surface: every flag multiplies the states your software can be in.

// WHAT THEY ARE

A feature flag (or toggle) is a switch in your code that decides whether a feature is active, evaluated at runtime instead of baked in at deploy. A feature flag platform manages those switches centrally: who sees what, gradual rollout percentages, targeting rules, and an audit trail — so flipping a feature doesn't require a code change or a deploy.

The defining idea is decoupling deployment from release. Code ships to production behind an off flag (a "dark" launch); releasing it is a separate, controlled act — flip it on for internal users, then a canary slice, then a percentage, then everyone, watching metrics at each ring. If anything goes wrong, you flip it back off in seconds rather than rolling back a deploy. Beyond release control, the same machinery powers experimentation — A/B tests that route variants to user segments and measure the difference — which is why these tools straddle engineering and product.

For QA, that power comes with a cost: flags multiply state. A feature behind a flag has at least two code paths to test; several flags interact combinatorially. Validating flag behavior, targeting rules, and the on/off (and rollback) paths becomes part of the test strategy.

// WHEN YOU NEED THEM

You reach for flags when releasing a change is risky and you want to limit blast radius: a big feature you want to ship incrementally, a change you want to validate on real traffic before full rollout, or anything you'd want to disable instantly without an emergency deploy. They're also how teams practice trunk-based development — merging unfinished work behind a flag rather than holding long-lived branches — and how product teams run experiments.

// The signals

Wanting to decouple "the code is deployed" from "users can see it"
Needing a kill switch for risky changes
Rolling out gradually (canary/percentage) and watching metrics
Running A/B experiments
Merging to main continuously without exposing half-built features

// COMPARISON

Tool	Type	Strength	Best for
LaunchDarkly	Commercial	Detailed targeting, fast propagation	Enterprise default; the mature standard
Unleash	Open source (+ paid)	Progressive delivery, self-host	Teams wanting OSS with enterprise features
GrowthBook	Open source (+ paid)	Experimentation + flags, modular	Experiment-led teams; local evaluation
Flagsmith	Open source (+ paid)	Flexible deploy, on-prem option	Teams needing self-hosted control
Flipt	Open source	Git-native, lightweight	GitOps-style flag management

// OPEN SOURCE VS PAID

This space has a strong open-source tier and an enterprise leader. LaunchDarkly is the commercial default — deep targeting, fast flag propagation, enterprise governance — and the tool many teams start with, until a renewal prompts the "could we self-host this?" question. The open-source contenders have matured into serious answers: Unleash, GrowthBook, Flagsmith, and Flipt all self-host, each with a different lean — Unleash on progressive delivery, GrowthBook on experimentation, Flagsmith on flexible/on-prem deployment, Flipt on a lightweight Git-native model. Most offer a paid hosted tier on top of the free core. Worth knowing: OpenFeature (CNCF) is the vendor-neutral SDK standard — instrument against it and you can switch providers without rewriting your flag code, the same lock-in hedge OpenTelemetry is for observability. For learners and small teams, a self-hosted Unleash or GrowthBook, or env-var flags graduating to a platform via OpenFeature, covers the ground at no licence cost.

// HOW TO CHOOSE

01Flags, experiments, or both? Pure release control and kill switches → Unleash, Flagsmith, LaunchDarkly. Heavy A/B testing and metrics → GrowthBook or a platform with strong experimentation. Many teams need both, but the emphasis shifts the choice.
02Self-host or managed? If data control or cost is the driver, the open-source platforms (Unleash, GrowthBook, Flagsmith, Flipt) self-host. If you want it handled and have budget, LaunchDarkly is the polished managed option.
03Instrument vendor-neutral. Whatever you pick, evaluate flags through OpenFeature where you can — it keeps you from rewriting flag code if you switch providers later.
04How big is the team? Small teams (≤15) do well on Flagsmith/GrowthBook with light governance; mid-size evaluate LaunchDarkly/Statsig with proper RBAC and pipeline checks; large orgs need a governance framework and lifecycle automation, not just a tool.
05Plan for flag lifecycle from day one. The tool is the easy part; the discipline — naming, ownership, sunset policies — is what determines whether flags help or rot. Pick a tool whose workflow supports retiring flags, and use it.

// COMMON MISTAKES

Not testing the off path (and the rollback). A flag has at least two states; teams test the new feature on and forget the experience with it off — which is exactly the state they'll fall back to in an incident. Test both, and test the flip.
Combinatorial flag explosion left untested. Several interacting flags create many combinations. You can't test all of them — but teams often test none of the interactions, then get surprised when two flags conflict in production.
Flag debt. Flags left in the code long after a feature is fully rolled out become permanent complexity — dead branches, confusing logic, risk. Without sunset policies, flag debt compounds. Retire flags once they've served their purpose.
Treating "behind a flag" as "tested." Shipping code dark is not the same as validating it. A flag controls exposure, not quality — the feature still needs testing before you flip it on for users.
No governance on who can flip what. A flag that anyone can toggle in production is an incident waiting to happen. Targeting and kill switches need access control and an audit trail, especially at scale.

// WHAT THEY ARE

// WHEN YOU NEED THEM

// The signals

// COMPARISON

// OPEN SOURCE VS PAID

// HOW TO CHOOSE

// COMMON MISTAKES

// RELATED

// Glossary

// Templates

// Interview prep

// Practice