Systems: Beyond Prompts

Systems at a glance

A System brings together everything that shapes your AI agent’s behavior: prompts, model settings, tools, and configuration. Instead of tweaking prompts in isolation, you version the complete agent setup and test it as one unit.

Screenshot of the Systems list in the UI.

Why it matters

Systems keep prompts, parameters, tools, and configuration together, give you versioned changes you can compare, and make it clear which version is in production. For teams building AI agents and complex workflows, this helps control configuration drift, preserves an audit trail of what’s live, and ensures you evaluate exactly what users experience.

If you call it something else

Agent configuration / prompt pipeline / workflow config: Systems capture the whole thing.
Where versions live: Each System has multiple versions; you can label one as production and compare to latest.
What’s captured: Prompts, model ids/params, tool definitions, routing settings, and custom flags that change agent behavior.

How versions work

Each System has versioned configurations. You can mark any version as production and keep iterating on latest; if production isn’t set, the latest is used until you choose one. New versions are auto‑named (“Version N”), identical configs collapse to the existing version, and deleting from the UI archives the System safely for history.

Create a System

Click New System, give it a name and description, and paste configuration JSON for your agent (whatever drives your agent’s behavior, e.g., tools, temperature, routing logic, or custom flags). Scorecard versions this for you.

Screenshot of creating a new System in the UI.

Inspect a System

Open a System to view details, timestamps, and all versions. You can quickly scan configurations and see which version is latest or marked as production.

Screenshot of System details and versions in the UI.

Test the full setup

Systems work hand-in-hand with Testsets, Metrics, and Runs so you evaluate realistic agent changes—not just prompt text. Try it with our quickstart and see how different configurations impact agent performance. → Try the Joke Bot Quickstart

Use cases

A/B test different agent configurations across the same testset.
Tune temperature, system messages, tools, or routing logic and measure impact.
Promote the best-performing agent version to production, with rollback safety.
Track regressions across agent releases by re-running previous versions.

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

Systems at a glance

Why it matters

If you call it something else

How versions work

Create a System

Inspect a System

Test the full setup

Use cases

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

​Systems at a glance

​Why it matters

​If you call it something else

​How versions work

​Create a System

​Inspect a System

​Test the full setup

​Use cases

Systems at a glance

Why it matters

If you call it something else

How versions work

Create a System

Inspect a System

Test the full setup

Use cases