LangSmith
Tracing, evaluation, and monitoring platform for LLM applications, from the LangChain team.
Category
MLOps
Difficulty
Intermediate
When to use
Debugging, evaluating, and monitoring LLM chains or agents in dev and production — especially if you're already on LangChain/LangGraph.
When not to use
Your LLM calls are simple and direct — OpenTelemetry + a basic log sink is enough.
Alternatives
LangFuse Arize Phoenix Weights & Biases Prompts Helicone
At a glance
| Field | Value |
|---|---|
| Category | LLM observability / eval |
| Difficulty | Intermediate |
| When to use | Tracing and evaluating LLM apps, especially LangChain |
| When not to use | Trivial one-shot LLM features |
| Alternatives | LangFuse, Arize Phoenix, Helicone |
What it is
LangSmith captures every LLM call, tool invocation, and chain step as a trace you can replay, inspect, and evaluate. It pairs with a dataset/eval system so you can run regression tests against a fixed set of inputs and grade outputs with LLM-as-judge or custom Python evaluators.
When we reach for it at Ephizen
- Debugging why a LangGraph agent took a wrong tool call.
- Running nightly regression evals against a golden dataset when we change a prompt or model.
- Monitoring production latency, cost, and error rate per chain.
- Sharing a failing trace with a teammate as a link.
Getting started
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=...
export LANGCHAIN_PROJECT=ephizen-rag
Any LangChain / LangGraph run in that process now streams traces to LangSmith with no code changes. For non-LangChain code, the langsmith SDK provides a @traceable decorator.
Gotchas
- Traces include full prompt and response bodies — scrub PII before sending or use self-hosted.
- Evaluators that call an LLM add cost and latency; budget accordingly.
- LangFuse is a strong open-source alternative if you want to avoid vendor lock-in.