Skip to main content

Overview

Scorecard’s MCP (Model Context Protocol) server lets you manage projects, create testsets, configure metrics, run evaluations, and analyze results through natural language in any MCP-compatible client.

Available Tools

The MCP server exposes ~45 tools covering metrics, scores, systems, annotations, and documentation search.
Scorecard MCP server tool listing showing ~45 available tools across Metrics, Scores, Systems, Annotations, and Docs.

Setting Up the MCP Server

Claude Code

Add the Scorecard remote MCP server with a single command:
claude mcp add --transport http scorecard https://mcp.scorecard.io/mcp
Complete the OAuth authentication flow in your browser when prompted. Verify the connection:
claude mcp list
You should see scorecard: https://mcp.scorecard.io/mcp (HTTP) - ✓ Connected.

Claude Desktop

Go to Claude Desktop settings and click the “Connectors” tab. Click “Add custom connector” and paste the URL: https://mcp.scorecard.io/mcp. Click “Add”, then “Connect” to login to Scorecard.

Local configuration

You can run the MCP server locally via npx:
export SCORECARD_API_KEY="your_api_key"
npx -y scorecard-ai-mcp@latest
For clients with a configuration JSON:
{
  "mcpServers": {
    "scorecard_ai": {
      "command": "npx",
      "args": ["-y", "scorecard-ai-mcp", "--client=claude", "--tools=dynamic"],
      "env": {
        "SCORECARD_API_KEY": "ak_MyAPIKey"
      }
    }
  }
}

Examples

Create a project and testset

Create a new Scorecard project called "Support Bot Eval". Then create a testset
called "Support Scenarios" with 10 testcases. Each testcase should have:
- inputs: "customerMessage" and "category" (billing, technical, or product)
- expected: "idealResponse"

Create metrics

Create two metrics in the "Support Bot Eval" project:
1. "Response Accuracy" (integer 1-5) - How well does the response answer the question?
2. "Tone" (boolean) - Is the response professional and empathetic?

Analyze results

Show me the latest run results for the "Support Bot Eval" project.
Which testcases scored lowest on Response Accuracy?

Generate testcases from a codebase

In Claude Code, you can combine file access with the MCP server:
Read the API routes in src/api/ and generate 20 testcases covering
the edge cases for each endpoint. Add them to the "API Tests" testset
in project 1234.

Iterate on metrics

The "Response Accuracy" metric is too lenient — update the prompt template
to penalize responses that miss key details from the ideal response.

Technical Details