Diagnosis requires two things: your agent must be set up and connected, and calls must be flowing through it. The more calls Tuner has received, the deeper and more precise the analysis.
Run a Diagnosis
Paste this into your chat to run a full diagnosis:What the Diagnosis Looks At
Following thetuner_analyze_agent prompt, the assistant works through your data in order, the same path a human analyst would take:
Reads your agent's configuration
Pulls the system prompt, workflow, and description, plus everything you’ve defined: outcomes, intents, data-extraction fields, evals, and red flags. This is the ground truth for what “good” looks like; everything else is measured against it.
Builds the health picture
Reviews success rate, red-flag rate, call volume, and trends over the last 30 days, then breaks calls down across your outcome labels and intent categories, to tell whether the agent is broadly healthy with isolated issues or degraded across the board.
Finds the priority signals
Looks at recently red-flagged calls and which intents and outcomes they cluster on. A red flag concentrated on one intent with bad outcomes becomes the lead to chase.
Isolates the problem segments
Filters the call logs down to the underperforming intent + outcome combinations and checks whether the problem is chronic or started at a specific agent version. A version-correlated change points to a regression, not a long-standing gap.
Digs into individual calls
Opens the worst calls in detail: which evals are failing, whether STT, LLM, TTS, or end-to-end latency is elevated, and whether the same red flags fire consistently.
What You Get Back
The assistant returns a structured diagnosis you can act on immediately:- Agent health: a one-line summary with the headline number that supports it.
- Issues found: each with the Signal (the data, with numbers), the Pattern across failing calls, the Root cause hypothesis and confidence, a specific Fix (what to change and where, not “improve your prompt”), and how to Verify it worked.
- What’s working: so you know what not to touch.
- Data gaps: configuration that, if added, would sharpen future diagnoses.
Example diagnosis output
Example diagnosis output
From Diagnosis to Fix
A diagnosis that you can’t act on is just a report. The power of doing this over the MCP is that the same assistant can apply the fix in the same conversation, because the MCP’s write tools cover everything in your Agent Settings. After it explains the root cause, just tell it to act. It can:- Update the prompt and workflow guidance to handle the scenario that’s failing.
- Create or refine evals so the behavior is scored on every future call.
- Add call outcomes and intents so the right calls get categorized instead of falling through.
- Add data-extraction fields to capture the values you need to measure the problem.
- Add red flags and alerts so you’re warned the moment it happens again.
Examples
A Spike in Unresolved Calls on One Intent
Suppose your Billing Dispute intent is ending Unresolved far more often than the rest of your calls. Ask:“Diagnose my agent: why are so many Billing Dispute calls ending unresolved, and how do I fix it?”The assistant pulls the outcome breakdown and sees Billing Dispute sitting at a 48% Unresolved rate vs. 12% agent-wide. It opens the worst calls and finds the same pattern: the agent can’t move forward because it never collects the account number before trying to act on the dispute. It reports the signal, the pattern, the root cause, and a concrete fix. Then you close the loop in the same chat:
“Apply that. Add an account-verification step to the prompt, create an eval that checks the agent collected and verified the account number, and add a data field to capture it.”The assistant updates the guidance and, through the MCP’s write tools, creates the eval (“Did the agent verify the account number before discussing the dispute?”) and the data-extraction field. Now every future Billing Dispute call is measured on exactly that behavior, so your next diagnosis can confirm the Unresolved rate is dropping instead of guessing.
Finding the Problems You Didn’t Know to Look For
You can only write evals for problems you already know about. The ones you don’t know about are usually where calls quietly fail. Diagnosis is good at surfacing them. Ask:“Are there calls that don’t match any of my configured intents? Cluster them and tell me what they’re about.”The assistant scans your call logs for calls that landed with no matching intent, reads the transcripts, and groups them into themes. For example, 62 calls in the last 30 days asking about international shipping, an intent you never defined. Because there’s no flow for it, those calls fall through to a generic response and abandon at a high rate. The assistant explains the theme, shows you example transcripts, and recommends:
- Add an International Shipping intent so these calls are tracked.
- Add a prompt/workflow branch that actually handles the request.
- Add an eval to check the agent gives accurate shipping information.
Keep Asking
The diagnosis is a starting point, not the end. The assistant still has your whole workspace at its fingertips, so you can drill into anything in plain language:“Show me the transcripts of the 3 worst calls behind that issue.”
“What’s different about the calls that fail versus the ones that succeed on this intent?”
“Break down latency for those calls. Which component is slow: STT, the LLM, or TTS?”
“Did this start at a specific agent version, or has it always been like this?”
“Draft the exact prompt change you’re recommending, then create the eval to track it.”
“Which eval is failing the most this week, and on what kind of calls?”
Diagnosis needs real calls to analyze. If your agent hasn’t received any calls in the last 30 days, the assistant will ask you to send calls through it first. With fewer than 10 calls it will still run, but it flags that the findings may not yet be statistically reliable.
What’s Next?
Set Up Your Agent
Connect the Tuner MCP to your IDE or chatbot and configure your agent.
Manage Tuner with MCP
Examples, best practices, and use cases for managing your agent with AI assistants.
How to Use Red Flags
Understand the red flags the diagnosis prioritizes when triaging issues.
Creating Custom Evaluations
Learn how the evals you create from a diagnosis are defined and scored.