Self-hosted · OpenAI-compatible · Offline license

Run every LLM through one self-hosted gateway.

100+ models behind one OpenAI-compatible endpoint. Per-team budgets enforced before tokens burn. Every query auditable. Postgres as source of truth. Your data never leaves your infra.

Request a trial → Read the docs

$ curl -fsSL https://scutum.dev/install.sh | sh

Not a developer? Try our AI search product → Scutum Research

Scutum admin console — cost dashboard with sparkline, model usage donut, and per-model spend table

OpenAI-compatible / 100+ models / Postgres source of truth / Ed25519 offline license / Self-hosted

01 / Why Scutum

One control plane, built for self-hosted teams.

Cost predicted before tokens burn

Per-team budgets enforced pre-request, not after the bill arrives. tiktoken-based prediction refuses calls that would breach budget — before they cost you anything.

✓ team:eng     prompt → 1,240 tok ≈ $0.018  allow
✓ team:eng     prompt → 8,400 tok ≈ $0.121  allow
✗ team:demo    prompt →  420 tok ≈ $0.006  block (over $50/day)

Built as a LiteLLM hook.

Your data, your infra, your audit log

Every request logged with team, project, user, model, cost, and latency. Postgres is the single source of truth. Ed25519 offline licensing — no phone-home, no vendor cloud.

Air-gapped deployments supported.

100+ models behind one API

OpenAI, Anthropic, Google, Meta, Mistral, Cohere, xAI, DeepSeek, plus self-hosted via Ollama and vLLM. Routing in Cedar. Fallbacks, caching, A/B splits.

Drop-in replacement for any OpenAI SDK call.

02 / Drop-in

A drop-in replacement, not a rewrite.

Point your existing OpenAI SDK at Scutum. Get cost tracking, audit, routing, and budgets — without touching your application code.

from openai import OpenAI

client = OpenAI(
    base_url="https://scutum.your-company.com/v1",
    api_key=os.environ["SCUTUM_API_KEY"]
)

response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Same SDK. Same code. Different endpoint. Now every call is logged, budgeted, and routable.

03 / Live demo

See Scutum running. Live.

Scutum Research is a Perplexity-style search product running on Scutum gateway. Ask any question and the answer can be an interactive component — calculators, charts, comparison tables — rendered in real React, sandboxed, with sources cited inline. Every query routes through your gateway, every model is swappable, every answer hits your audit log.

Asking Scutum Research for a sprint planner. Sliders update the chart in real time.

Try Scutum Research →

04 / Self-hosted

Built for teams that can't send data to a vendor cloud.

If you're in fintech, healthcare, defense, or any team handling proprietary code or customer data, sending every LLM call through Portkey or Helicone isn't an option. You need the same control plane those products offer — cost governance, audit logs, multi-model routing — but running on infrastructure you control.

Scutum is that control plane. Postgres as the source of truth. Ed25519-signed offline licenses. Air-gapped deployments supported. Same enterprise features as the SaaS alternatives, none of the data exfiltration risk.

05 / Capabilities

Everything in one control plane.

OpenAIAnthropicGoogleMetaMistralCoherexAIDeepSeekBedrockVertexAzureOllamavLLM

Routing & Reliability

Fallback chains
Model groups (round-robin, least-cost, lowest-latency)
Semantic caching
A/B traffic splits with auto-promote
Provider health checks
Conditional rules per team or cost band

Governance & Compliance

Per-team and per-project budgets
Cost prediction with tiktoken
DLP scanning with custom regex
Cedar policy language
Full audit log in Postgres
SSO via OIDC and SAML
Retention policies, GDPR delete-by-user

06 / Pricing

Simple pricing. Self-hosted always.

Free / Self-host

Single team
Cost tracking + audit
Community support

Self-host now →

Pro

$499/mo per deployment

Everything in Free
Multi-team budgets
Routing policies
Semantic caching
Chat Team for your org

Request a trial →

Enterprise

Contact sales

Everything in Pro
SSO, SAML, custom Cedar
SRE agent + A2A workflows
Air-gap support
Dedicated support

Talk to us →

All tiers are self-hosted. Your data never leaves your infrastructure.

Run it on your infra in under a minute.

$ curl -fsSL https://scutum.dev/install.sh | sh

Read the docs → · Try Scutum Research →