From Observed to Understood
Getting calls into Tuner is the first step. The second is understanding what the data is telling you. The dashboard shows you that things are failing. Diagnosis shows you why, and closes the loop: fixing the prompt, creating the evals that catch the issue, and adding the data capture that keeps it visible going forward. This is done through the Tuner MCP. Because the MCP can both read your entire workspace and write back to it, your AI assistant can diagnose a problem and fix it in the same conversation.Running a Diagnosis
Once the Tuner MCP is connected, paste this into your chat:What You Can Discover
Problems you know about
If you can see a metric that looks wrong, ask directly:“Why are so many of my Billing Dispute calls ending unresolved? Diagnose and fix it.”The assistant pulls the outcome breakdown, opens the worst calls, finds the pattern, and explains it. Then you can tell it to apply the fix in the same chat: update the prompt, create the eval that tracks the corrected behavior, and add the data field that makes it measurable.
Problems you didn’t know about
You can only write evals for behaviors you already know to look for. The ones you don’t know about are usually where calls quietly fail. The diagnosis can surface them:“Are there calls that don’t match any of my configured intents? Cluster them and tell me what they’re about.”The assistant finds calls with no matching intent, reads the transcripts, and groups them into themes. Maybe 60 calls in the last month asked about something your agent was never designed to handle, and they’re all dropping off. Now you know: add the intent, add the prompt branch, add the eval.
The Loop
Diagnosis is not a one-time event. Each pass leaves your agent better instrumented:- Diagnose what’s failing and why
- Fix the prompt or workflow to address it
- Add evals and data fields so the behavior is measured on every future call
- Re-diagnose to confirm the fix worked
Keep Asking
After the initial diagnosis, keep the conversation going:“Show me the transcripts of the worst calls behind that issue.”
“Did this problem start at a specific agent version, or has it always been there?”
“Which eval is failing the most this week, and on what kind of calls?”
“Draft the exact prompt change, then create the eval to track it.”
Diagnosis needs real call data. If your agent hasn’t received calls in the last 30 days, the assistant will ask you to send some through first. With fewer than 10 calls it will still run but notes the findings may not yet be statistically reliable.
Full Reference
For the complete guide including the diagnosis output format, worked examples, and a full list of follow-up questions:Diagnose Your Agent
The full reference: how the diagnosis works, what you get back, and detailed examples.
Set Up Your Agent
Connect the Tuner MCP to your IDE or chatbot in a few minutes.