Self-hosted · OpenAI-compatible · Offline license

Run every LLM through one self-hosted gateway.

100+ models behind one OpenAI-compatible endpoint. Per-team budgets enforced before tokens burn. Every query auditable. Postgres as source of truth. Your data never leaves your infra.

$ curl -fsSL https://scutum.dev/install.sh | sh
Not a developer? Try our AI search product Scutum Research
scutum.your-company.com / cost
Scutum admin console — cost dashboard with sparkline, model usage donut, and per-model spend table
OpenAI-compatible / 100+ models / Postgres source of truth / Ed25519 offline license / Self-hosted
01 / Why Scutum

One control plane, built for self-hosted teams.

Cost predicted before tokens burn

Per-team budgets enforced pre-request, not after the bill arrives. tiktoken-based prediction refuses calls that would breach budget — before they cost you anything.

 team:eng     prompt → 1,240 tok ≈ $0.018  allow
 team:eng     prompt → 8,400 tok ≈ $0.121  allow
 team:demo    prompt →  420 tok ≈ $0.006  block (over $50/day)

Built as a LiteLLM hook.

Your data, your infra, your audit log

Every request logged with team, project, user, model, cost, and latency. Postgres is the single source of truth. Ed25519 offline licensing — no phone-home, no vendor cloud.

Air-gapped deployments supported.

100+ models behind one API

OpenAI, Anthropic, Google, Meta, Mistral, Cohere, xAI, DeepSeek, plus self-hosted via Ollama and vLLM. Routing in Cedar. Fallbacks, caching, A/B splits.

Drop-in replacement for any OpenAI SDK call.

02 / Drop-in

A drop-in replacement, not a rewrite.

Point your existing OpenAI SDK at Scutum. Get cost tracking, audit, routing, and budgets — without touching your application code.

from openai import OpenAI

client = OpenAI(
    base_url="https://scutum.your-company.com/v1",
    api_key=os.environ["SCUTUM_API_KEY"]
)

response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Same SDK. Same code. Different endpoint. Now every call is logged, budgeted, and routable.

03 / Live demo

See Scutum running. Live.

Scutum Research is a Perplexity-style search product running on Scutum gateway. Ask any question and the answer can be an interactive component — calculators, charts, comparison tables — rendered in real React, sandboxed, with sources cited inline. Every query routes through your gateway, every model is swappable, every answer hits your audit log.

Asking Scutum Research for a sprint planner. Sliders update the chart in real time.

Powered by Scutum. Every query is auditable. · Every model is pluggable.

04 / Self-hosted

Built for teams that can't send data to a vendor cloud.

If you're in fintech, healthcare, defense, or any team handling proprietary code or customer data, sending every LLM call through Portkey or Helicone isn't an option. You need the same control plane those products offer — cost governance, audit logs, multi-model routing — but running on infrastructure you control.

Scutum is that control plane. Postgres as the source of truth. Ed25519-signed offline licenses. Air-gapped deployments supported. Same enterprise features as the SaaS alternatives, none of the data exfiltration risk.

05 / Capabilities

Everything in one control plane.

OpenAIAnthropicGoogleMetaMistralCoherexAIDeepSeekBedrockVertexAzureOllamavLLM

Routing & Reliability

  • Fallback chains
  • Model groups (round-robin, least-cost, lowest-latency)
  • Semantic caching
  • A/B traffic splits with auto-promote
  • Provider health checks
  • Conditional rules per team or cost band

Governance & Compliance

  • Per-team and per-project budgets
  • Cost prediction with tiktoken
  • DLP scanning with custom regex
  • Cedar policy language
  • Full audit log in Postgres
  • SSO via OIDC and SAML
  • Retention policies, GDPR delete-by-user
See full feature list →
06 / Pricing

Simple pricing. Self-hosted always.

Free / Self-host

$0

  • Single team
  • Cost tracking + audit
  • Community support
Self-host now →

Enterprise

Contact sales

  • Everything in Pro
  • SSO, SAML, custom Cedar
  • SRE agent + A2A workflows
  • Air-gap support
  • Dedicated support
Talk to us →

All tiers are self-hosted. Your data never leaves your infrastructure.

Run it on your infra in under a minute.

$ curl -fsSL https://scutum.dev/install.sh | sh