Cost predicted before tokens burn
Per-team budgets enforced pre-request, not after the bill arrives. tiktoken-based prediction refuses calls that would breach budget — before they cost you anything.
✓ team:eng prompt → 1,240 tok ≈ $0.018 allow
✓ team:eng prompt → 8,400 tok ≈ $0.121 allow
✗ team:demo prompt → 420 tok ≈ $0.006 block (over $50/day)
Built as a LiteLLM hook.
Your data, your infra, your audit log
Every request logged with team, project, user, model, cost, and latency. Postgres is the single source of truth. Ed25519 offline licensing — no phone-home, no vendor cloud.
Air-gapped deployments supported.
100+ models behind one API
OpenAI, Anthropic, Google, Meta, Mistral, Cohere, xAI, DeepSeek, plus self-hosted via Ollama and vLLM. Routing in Cedar. Fallbacks, caching, A/B splits.
Drop-in replacement for any OpenAI SDK call.