MCP Server Integration

Overview

Scorecard’s MCP (Model Context Protocol) server lets you manage projects, create testsets, configure metrics, run evaluations, and analyze results through natural language in any MCP-compatible client.

Available Tools

The MCP server exposes ~45 tools covering metrics, scores, systems, annotations, and documentation search.

Setting Up the MCP Server

Claude Code

Add the Scorecard remote MCP server with a single command:

claude mcp add --transport http scorecard https://mcp.scorecard.io/mcp

Complete the OAuth authentication flow in your browser when prompted. Verify the connection:

claude mcp list

You should see scorecard: https://mcp.scorecard.io/mcp (HTTP) - ✓ Connected.

Claude Desktop

Go to Claude Desktop settings and click the “Connectors” tab. Click “Add custom connector” and paste the URL: https://mcp.scorecard.io/mcp. Click “Add”, then “Connect” to login to Scorecard.

Local configuration

You can run the MCP server locally via npx:

export SCORECARD_API_KEY="your_api_key"
npx -y scorecard-ai-mcp@latest

For clients with a configuration JSON:

{
  "mcpServers": {
    "scorecard_ai": {
      "command": "npx",
      "args": ["-y", "scorecard-ai-mcp", "--client=claude", "--tools=dynamic"],
      "env": {
        "SCORECARD_API_KEY": "ak_MyAPIKey"
      }
    }
  }
}

Examples

Create a project and testset

Create a new Scorecard project called "Support Bot Eval". Then create a testset
called "Support Scenarios" with 10 testcases. Each testcase should have:
- inputs: "customerMessage" and "category" (billing, technical, or product)
- expected: "idealResponse"

Create metrics

Create two metrics in the "Support Bot Eval" project:
1. "Response Accuracy" (integer 1-5) - How well does the response answer the question?
2. "Tone" (boolean) - Is the response professional and empathetic?

Analyze results

Show me the latest run results for the "Support Bot Eval" project.
Which testcases scored lowest on Response Accuracy?

Generate testcases from a codebase

In Claude Code, you can combine file access with the MCP server:

Read the API routes in src/api/ and generate 20 testcases covering
the edge cases for each endpoint. Add them to the "API Tests" testset
in project 1234.

Iterate on metrics

The "Response Accuracy" metric is too lenient — update the prompt template
to penalize responses that miss key details from the ideal response.

Technical Details

Built on the Model Context Protocol standard
Compatible with any MCP client (Claude Code, Claude Desktop, Cursor, and more)
Secured with OAuth authentication
Open source: github.com/scorecard-ai/scorecard-mcp

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

Overview

Available Tools

Setting Up the MCP Server

Claude Code

Claude Desktop

Local configuration

Examples

Create a project and testset

Create metrics

Analyze results

Generate testcases from a codebase

Iterate on metrics

Technical Details

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

​Overview

​Available Tools

​Setting Up the MCP Server

​Claude Code

​Claude Desktop

​Local configuration

​Examples

​Create a project and testset

​Create metrics

​Analyze results

​Generate testcases from a codebase

​Iterate on metrics

​Technical Details

Overview

Available Tools

Setting Up the MCP Server

Claude Code

Claude Desktop

Local configuration

Examples

Create a project and testset

Create metrics

Analyze results

Generate testcases from a codebase

Iterate on metrics

Technical Details