Zurück

End Vibes-Based RAG — Evals as the Control Plane

Vibe-checks break at scale.Most RAG/agent systems ship on vibes: a few demo queries, a thumbs-up, and into prod they go.

In this talk, we show how to move evals from a dashboard into the decision loop. Using a real case study over a large academic corpus with strict citation requirements, we wired gates for faithfulness, answer relevancy, context relevancy, and citation integrity so agents either iterate or proceed. With traces (Arize, Langfuse, LangSmith—pick your stack) tied to each step, bad retrievals trigger re-search instead of bad answers reaching users.

You’ll leave with a vendor-neutral recipe for eval-as-control-plane that improves qualityand gives QA/Legal they can sign-off

Vorkenntnisse

Basic RAG concept
Why LLM is non-determinstic

Lernziele

A vendor-neutral eval-in-the-loop pattern
how to define/score faithfulness, answer/context relevancy, citation integrity
where to place gates, pick thresholds, and trace decisions across Braintrust/LangSmith/Langfuse
privacy/PII guardrails that pass review

Speaker

Jeff Fan is a Solutions Architect at DigitalOcean designing Kubernetes-based GPU stacks for LLM inference. He speaks on right-sizing LLM serving (vLLM/KServe/llm-d), building memory-enabled agents, and eval-first RAG ("evals, not vibes"). He turns cloud/AI complexity into copy-paste playbooks that help teams move from PoC to production.

Jetzt Tickets sichern