Skip to main content

From Observed to Understood

Getting calls into Tuner is the first step. The second is understanding what the data is telling you. The dashboard shows you that things are failing. Diagnosis shows you why, and closes the loop: fixing the prompt, creating the evals that catch the issue, and adding the data capture that keeps it visible going forward. This is done through the Tuner MCP. Because the MCP can both read your entire workspace and write back to it, your AI assistant can diagnose a problem and fix it in the same conversation.

Running a Diagnosis

Once the Tuner MCP is connected, paste this into your chat:
Diagnose my agent and tell me what is broken and how to fix it.
Fetch and follow the 'tuner_analyze_agent' prompt from the Tuner MCP server.

agent_id: 485
workspace_id: 1420
agent_name: agent007
Replace agent_id, workspace_id, and agent_name with your own values. Find them in Workspace Settings on the Tuner dashboard.
The assistant works through your data the same way an analyst would: reads your agent’s config, checks the health dashboard, identifies which intents and outcomes are underperforming, opens the worst calls, and cross-references the failures against your system prompt to find the root cause. It reports back with the signal (numbers), the pattern, the root cause, a specific fix, and the metric to watch.

What You Can Discover

Problems you know about

If you can see a metric that looks wrong, ask directly:
“Why are so many of my Billing Dispute calls ending unresolved? Diagnose and fix it.”
The assistant pulls the outcome breakdown, opens the worst calls, finds the pattern, and explains it. Then you can tell it to apply the fix in the same chat: update the prompt, create the eval that tracks the corrected behavior, and add the data field that makes it measurable.

Problems you didn’t know about

You can only write evals for behaviors you already know to look for. The ones you don’t know about are usually where calls quietly fail. The diagnosis can surface them:
“Are there calls that don’t match any of my configured intents? Cluster them and tell me what they’re about.”
The assistant finds calls with no matching intent, reads the transcripts, and groups them into themes. Maybe 60 calls in the last month asked about something your agent was never designed to handle, and they’re all dropping off. Now you know: add the intent, add the prompt branch, add the eval.

The Loop

Diagnosis is not a one-time event. Each pass leaves your agent better instrumented:
  1. Diagnose what’s failing and why
  2. Fix the prompt or workflow to address it
  3. Add evals and data fields so the behavior is measured on every future call
  4. Re-diagnose to confirm the fix worked
The MCP handles all four steps in a single conversation. Over time, fewer things slip through undetected.

Keep Asking

After the initial diagnosis, keep the conversation going:
“Show me the transcripts of the worst calls behind that issue.”
“Did this problem start at a specific agent version, or has it always been there?”
“Which eval is failing the most this week, and on what kind of calls?”
“Draft the exact prompt change, then create the eval to track it.”
Diagnosis needs real call data. If your agent hasn’t received calls in the last 30 days, the assistant will ask you to send some through first. With fewer than 10 calls it will still run but notes the findings may not yet be statistically reliable.

Full Reference

For the complete guide including the diagnosis output format, worked examples, and a full list of follow-up questions:

Diagnose Your Agent

The full reference: how the diagnosis works, what you get back, and detailed examples.
Not connected yet? Set up the MCP first:

Set Up Your Agent

Connect the Tuner MCP to your IDE or chatbot in a few minutes.