Arize Phoenix
Open-source LLM observability from Arize. Uses OpenInference — a widely-adopted set of OpenTelemetry semantic conventions for LLM spans — so instrumentation is portable across backends. Elastic License 2.0. Notebook-friendly, with a strong eval harness (Phoenix Evals) and embedding drift detection that's distinctive among open-source options.
Pricing
Freemium
Type
Automation
Languages
Python, TypeScript
// VERDICT
Reach for Arize Phoenix when you want open-source, OpenTelemetry-based tracing and evaluation for LLM/ML apps, runnable locally. Skip it when you want a fully managed platform or just config-driven prompt tests.
Best for
Open-source observability and evaluation for LLM and ML apps - OpenTelemetry-based tracing, eval and drift analysis you can run locally or self-hosted.
Avoid when
You want a fully managed vendor platform, or a simple config-only prompt tester.
CI/CD fit
OpenTelemetry tracing · local/self-host · eval runs
Languages
Python · TypeScript
Team fit
LLM/ML app teams · Teams wanting OSS observability · Dev/QA debugging + evaluating
Setup
Maintenance
Learning
Licence
// BEST FOR
- Open-source LLM/ML tracing via OpenTelemetry
- Running locally or self-hosted (no vendor lock-in)
- Evaluating outputs and analysing drift
- Debugging chains/agents from spans
- Inspecting RAG retrieval and embeddings
- Feeding traced failures into eval sets
// AVOID WHEN
- You want a fully managed vendor platform
- A simple config-only prompt tester is enough
- You can't run/host the tool
- You're not building LLM/ML apps
- Turnkey enterprise support is essential
- Only manual eval is needed
// QUICK START
pip install arize-phoenix
# launch Phoenix locally, instrument the app via OpenTelemetry
# inspect traces, run evals, analyse drift// ALTERNATIVES TO CONSIDER
// FEATURES
- OpenTelemetry-native with OpenInference semantic conventions — instrument once, send anywhere
- Phoenix Evals — research-backed metrics covering agents, RAG, and safety
- Embedding drift detection and RAG-specific quality metrics
- Notebook-first workflow; runs in Colab or locally for rapid experimentation
- Free open-source Phoenix; commercial Arize AX for enterprise scale
// PROS
- OpenInference compatibility means no re-instrumentation if you ever migrate backends
- Strongest eval-metric library among open-source options
- Free Phoenix tier is unrestricted for self-hosting
// CONS
- Trace UX is span-tree-first — no transcript view for long agent runs
- Less purpose-built for production agent debugging than Laminar
- Graduating to commercial Arize AX has a different cost curve — plan ahead if you need enterprise scale
// EXAMPLE QA WORKFLOW
Install Phoenix (pip) and launch locally/self-hosted
Instrument the app with OpenTelemetry
Capture traces of LLM/RAG runs
Evaluate outputs and analyse drift
Debug failures from spans
Feed traced failures into eval datasets
// RELATED QA.CODES RESOURCES
Cheat sheets
Glossary