DSPy
A framework for programming (not prompting) LLMs — declare signatures and modules, then let an optimizer compile prompts and few-shot examples for you.
Category
LLM & Agent Frameworks
Difficulty
Advanced
When to use
You have a well-defined task with examples and want the framework to automatically search over prompts, few-shot demos, and even fine-tunes.
When not to use
You have no labeled examples or eval metric — DSPy's superpower is optimization, and without data there's nothing to optimize.
Alternatives
LangChain Raw prompt engineering TextGrad
At a glance
| Field | Value |
|---|---|
| Category | LLM programming framework |
| Difficulty | Advanced |
| When to use | Tasks with examples and metrics to optimize |
| When not to use | Ad hoc prompting with no labeled data |
| Alternatives | LangChain, TextGrad, hand-tuned prompts |
What it is
DSPy (from Stanford NLP) treats LLM pipelines like programs. You declare Signature types (input fields → output fields) and compose Modules (Predict, ChainOfThought, ReAct). A compiler then searches — using your training examples and metric — for the best few-shot demonstrations, prompt templates, or fine-tuned weights. The net effect is “stop hand-crafting prompts; let the optimizer handle it”.
When we reach for it at Ephizen
- Multi-step pipelines where individual prompts are tangled and brittle.
- Tasks where we have a good eval set and want reproducible improvements over time.
- Porting a prompt-heavy pipeline to a smaller, cheaper model via prompt search.
Getting started
import dspy
dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
class ExtractFacts(dspy.Signature):
"""Extract a list of facts from the text."""
text: str = dspy.InputField()
facts: list[str] = dspy.OutputField()
extract = dspy.ChainOfThought(ExtractFacts)
print(extract(text="The capital of France is Paris.").facts)
Gotchas
- The mental model is different from LangChain. Expect a real ramp-up.
- The optimizer can burn a lot of tokens during compilation — use a cheap model for optimization passes.
- DSPy’s best results come from iterating on the metric, not the prompt. If your metric is bad, DSPy will cheerfully optimize the wrong thing.
Related tools
- HuggingFace TransformersThe library that made pretrained transformers trivially loadable — from BERT to Llama — with a consistent API across tasks.
- LangChainA Python/JS framework for composing LLM calls, prompts, tools, and memory into end-to-end applications.
- LangGraphA state-machine library from the LangChain team for building controllable, stateful LLM agents as explicit graphs of nodes and edges.