What Are Evaluations?
An evaluation is the process of analyzing a call against a set of criteria. Think of it as an automated report card for every conversation. Instead of manually listening to calls, you define what matters and let Tuner do the monitoring at scale.Why Evaluations Matter: Without evaluations, you’re flying blind. You might know that a call “failed,” but you won’t know why. Evaluations give you the “why” by checking for specific behaviors, metrics, and outcomes.
What Are Guardrails?
A guardrail is a set of rules that define what “bad” looks like for your agent. They allow you to catch issues early by automatically flagging calls that exhibit problematic behavior. When a guardrail is breached, it can trigger a red flag or an alert.Why Guardrails Matter: Guardrails are your safety net. They ensure that even if you’re not actively monitoring, you’ll be notified when something goes wrong.
How They Work Together
The same evaluations and guardrails run on every production call and every Call Simulation call. What you configure here powers both monitoring and testing.
Types of Evaluations in Tuner
- Evals
- Voice Metrics
- Labels & Red Flags
Evals are custom evaluations that verify whether your agent performed specific actions or followed certain guidelines during a call. You define the criteria, and Tuner automatically evaluates every call against them.Example: Create an eval called “Appointment Confirmed” that verifies the agent explicitly confirmed the date, time, and location of an appointment before ending the call.See Creating Custom Evaluations.
Common Use Cases
Detecting Hallucinations
Create a guardrail that flags calls where the agent provides incorrect information.
Ensuring Compliance
Create an eval that verifies the agent always reads a required disclaimer.
Monitoring Sentiment
Create a guardrail that alerts you when user sentiment is consistently negative.
Next Steps
Learn about the pre-defined metrics that Tuner computes for every call.Introduction to Pre-defined Metrics
An explanation of all out-of-the-box metrics and what they measure.