LangSmith
Hosted platform from LangChain for tracing, evaluating, and monitoring LLM applications.
Pricing
Freemium
Type
Automation
Languages
Python, JavaScript, TypeScript
// VERDICT
Reach for LangSmith when you want hosted tracing plus evaluation for LLM apps - debugging runs and gating on eval scores. Skip it when you want open-source self-hosting (Langfuse/Phoenix) or code-only evals without a platform.
Best for
LangChain's platform for tracing, evaluating and monitoring LLM apps - capture every run, build eval datasets, score outputs and debug chains/agents, whether or not you use LangChain.
Avoid when
You want a fully open-source self-hosted tool, or a lightweight code-only eval without a platform.
CI/CD fit
SDK tracing · eval datasets · CI eval runs
Languages
Python · JavaScript · TypeScript
Team fit
LLM app teams · LangChain users (and non-users) · Dev/QA debugging + evaluating LLMs
Setup
Maintenance
Learning
Licence
// BEST FOR
- Tracing every LLM/chain/agent run for debugging
- Building eval datasets from traced runs
- Scoring outputs (code or LLM-as-judge) against datasets
- Monitoring LLM apps in production
- Works with or without LangChain
- Catching regressions when prompts/models change
// AVOID WHEN
- You require fully open-source self-hosting
- A lightweight code-only eval is enough
- You can't send traces to a hosted service
- You're not building LLM apps
- Only prompt comparison is needed (PromptFoo)
- Turnkey on-prem is mandatory
// QUICK START
pip install langsmith # or npm i langsmith
# set LANGCHAIN_TRACING_V2 + API key -> runs are traced;
# define datasets + evaluators, run evals in CI// ALTERNATIVES TO CONSIDER
| Tool | Choose it when |
|---|---|
| Langfuse | You want open-source, self-hostable tracing + eval. |
| Arize Phoenix | You want open-source observability with OpenTelemetry. |
| Braintrust | You want a managed eval-first platform with datasets. |
// FEATURES
- Distributed tracing for chains, agents, and tool calls
- Datasets and evaluation runs with custom evaluators
- Prompt playground with versioning and side-by-side compare
- Production monitoring with feedback capture
- Annotation queues for human review
// PROS
- Best-in-class tracing UX for LangChain and LangGraph apps
- Works with non-LangChain code via the SDK
- Generous free tier for individual developers
- Tight loop between debugging traces and turning failures into evals
// CONS
- Closed-source SaaS — self-hosting limited to enterprise tier
- Pricing scales with trace volume and can surprise teams
- Tightest experience reserved for the LangChain ecosystem
// EXAMPLE QA WORKFLOW
Enable LangSmith tracing via SDK/env
Capture runs from your LLM app
Build eval datasets from traced runs
Define evaluators and score outputs
Gate CI on eval scores/regressions
Monitor production and feed back new cases
// RELATED QA.CODES RESOURCES
Cheat sheets
Glossary