End Vibes-Based RAG — Evals as the Control Plane
Vibe-checks break at scale.Most RAG/agent systems ship on vibes: a few demo queries, a thumbs-up, and into prod they go.
In this talk, we show how to move evals from a dashboard into the decision loop. Using a real case study over a large academic corpus with strict citation requirements, we wired gates for faithfulness, answer relevancy, context relevancy, and citation integrity so agents either iterate or proceed. With traces (Arize, Langfuse, LangSmith—pick your stack) tied to each step, bad retrievals trigger re-search instead of bad answers reaching users.
You’ll leave with a vendor-neutral recipe for eval-as-control-plane that improves qualityand gives QA/Legal they can sign-off
Vorkenntnisse
- Basic RAG concept
- Why LLM is non-determinstic
Lernziele
- A vendor-neutral eval-in-the-loop pattern
- how to define/score faithfulness, answer/context relevancy, citation integrity
- where to place gates, pick thresholds, and trace decisions across Braintrust/LangSmith/Langfuse
- privacy/PII guardrails that pass review