Möchtet ihr mit eurem Team teilnehmen? Ab drei Personen profitiert ihr von unseren Gruppenrabatten! Direkt im Shop buchen!

End Vibes-Based RAG — Evals as the Control Plane

Vibe-checks break at scale.Most RAG/agent systems ship on vibes: a few demo queries, a thumbs-up, and into prod they go.

In this talk, we show how to move evals from a dashboard into the decision loop. Using a real case study over a large academic corpus with strict citation requirements, we wired gates for faithfulness, answer relevancy, context relevancy, and citation integrity so agents either iterate or proceed. With traces (Arize, Langfuse, LangSmith—pick your stack) tied to each step, bad retrievals trigger re-search instead of bad answers reaching users.

You’ll leave with a vendor-neutral recipe for eval-as-control-plane that improves qualityand gives QA/Legal they can sign-off

Vorkenntnisse

  • Basic RAG concept
  • Why LLM is non-determinstic

Lernziele

  • A vendor-neutral eval-in-the-loop pattern
  • how to define/score faithfulness, answer/context relevancy, citation integrity
  • where to place gates, pick thresholds, and trace decisions across Braintrust/LangSmith/Langfuse
  • privacy/PII guardrails that pass review

Speaker

 

Jeff Fan
Jeff Fan is a Solutions Architect at DigitalOcean designing Kubernetes-based GPU stacks for LLM inference. He speaks on right-sizing LLM serving (vLLM/KServe/llm-d), building memory-enabled agents, and eval-first RAG ("evals, not vibes"). He turns cloud/AI complexity into copy-paste playbooks that help teams move from PoC to production.