MLflow
Open-source platform for the full ML lifecycle — experiments, models, and deployments.
Pricing
Free / Open source
Type
Automation
Languages
Python, Java
// VERDICT
Reach for MLflow when you want open-source experiment tracking, a model registry and lifecycle management (plus LLM eval). Skip it when you only need lightweight prompt/LLM evals or aren't tracking ML experiments.
Best for
An open-source platform for the ML lifecycle - experiment tracking, model registry, packaging and deployment, now with LLM evaluation features too.
Avoid when
You only need LLM prompt evals (lighter tools fit), or you're not doing ML experiment tracking.
CI/CD fit
Tracking server / SDK · self-host or managed · CI logging
Languages
Python · Java
Team fit
ML/data-science teams · MLOps · Teams managing model lifecycle
Setup
Maintenance
Learning
Licence
// BEST FOR
- Tracking experiments (params, metrics, artifacts)
- A model registry for versioning and stage transitions
- Packaging and deploying models
- LLM evaluation features alongside ML
- Open-source and self-hostable
- Reproducible ML runs
// AVOID WHEN
- You only need lightweight LLM prompt evals
- You're not doing ML experiment tracking
- A hosted-only platform is preferred
- No-code is required
- Minimal setup is essential
- Your work is purely prompt-engineering
// QUICK START
pip install mlflow
mlflow server # tracking server
# log params/metrics/artifacts from training/eval; use the model registry// ALTERNATIVES TO CONSIDER
| Tool | Choose it when |
|---|---|
| Weights & Biases | You want a managed experiment-tracking platform with rich UI. |
| Great Expectations | Your need is data validation rather than experiment tracking. |
| LangSmith | You're focused on LLM tracing + eval, not ML lifecycle. |
// FEATURES
- Experiment tracking for parameters, metrics, and artifacts
- Model registry with versioning and stage transitions
- Reproducible runs via MLflow Projects
- Model packaging and deployment to multiple targets
- LLM evaluation with prompt and tracing support
// PROS
- Vendor-neutral and integrates with major training frameworks
- Backed by Databricks with strong enterprise adoption
- Self-hostable with no cloud lock-in
- Well-established community and ecosystem
// CONS
- UI feels dated compared to newer tracking platforms
- Self-hosting at scale requires non-trivial infrastructure
- Less polished for pure-LLM-app workflows than purpose-built tools
// EXAMPLE QA WORKFLOW
Run an MLflow tracking server (or managed)
Log experiments (params, metrics, artifacts)
Register and version models
Promote models through stages
Log LLM/ML eval results in CI
Gate on metrics; manage artifact storage
// RELATED QA.CODES RESOURCES
Cheat sheets
Glossary