Ragas

An evaluation framework for retrieval-augmented generation systems — faithfulness, answer relevance, context precision, and more.

Category
MLOps
Difficulty
Intermediate
When to use
You have a RAG pipeline and need quantitative metrics beyond eyeballing outputs, especially for regression testing and comparison.
When not to use
You have no eval set at all yet — build one first, then reach for Ragas.
Alternatives
TruLens DeepEval LangSmith evals Custom LLM-as-judge

At a glance

FieldValue
CategoryRAG evaluation framework
DifficultyIntermediate
When to useMeasuring RAG quality over a fixed dataset
When not to useYou have no labeled or golden examples yet
AlternativesTruLens, DeepEval, LangSmith evals

What it is

Ragas provides a set of RAG-specific metrics — faithfulness (does the answer stay grounded in retrieved context), answer relevance, context precision, context recall — that use an LLM as a judge under the hood. You feed in a dataset of (question, answer, contexts, ground_truth) rows and Ragas scores each row on each metric.

When we reach for it at Ephizen

  • Before and after any change to chunking, embedding model, or reranker.
  • A/B comparing retrieval strategies on the same golden questions.
  • Catching regressions when we swap LLM providers or models.
  • Generating a scorecard that non-ML stakeholders can actually read.

Getting started

from datasets import Dataset
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_precision

ds = Dataset.from_dict({
    "question": [...],
    "answer": [...],
    "contexts": [...],       # list[list[str]]
    "ground_truth": [...],
})
result = evaluate(ds, metrics=[faithfulness, answer_relevancy, context_precision])
print(result)

Gotchas

  • Ragas calls an LLM per metric per row. On a 1000-row eval that adds up fast; use a cheap judge model.
  • LLM-as-judge is noisy. Run evals twice and report the mean, not a single number.
  • Metrics are heuristics, not ground truth. Always spot-check outliers by hand.

Related tools