| Field |
Value |
| Version |
1.0 |
| Last Updated |
2026-02 |
| Classification |
Internal |
| Review Cycle |
Quarterly |
1. System Overview
1.1 Architecture Context
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY: CLUSTER │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ TRUST BOUNDARY: GATEWAY ││
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││
│ │ │ LiteLLM │───▶│Agent Gateway │───▶│ vLLM │ ││
│ │ │ (L7 Proxy) │ │ (Data Plane) │ │ (Inference) │ ││
│ │ └──────────────┘ └──────────────┘ └──────────────┘ ││
│ │ │ │ │ ││
│ │ ▼ ▼ ▼ ││
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││
│ │ │ PostgreSQL │ │ Vault │ │ Redis │ ││
│ │ │ (State) │ │ (Secrets) │ │ (Cache) │ ││
│ │ └──────────────┘ └──────────────┘ └──────────────┘ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────┘
▲ │
│ ▼
┌─────────┐ ┌──────────────────┐
│ Clients │ │ External LLM APIs│
│(Agents) │ │ (OpenAI/Anthropic)│
└─────────┘ └──────────────────┘
1.2 Data Flow Summary
- Inbound: Client → Ingress → LiteLLM → Agent Gateway → Backend (vLLM/External)
- Secrets: Vault → Agent Gateway → External API calls
- Telemetry: All components → OTel Collector → Prometheus/Jaeger
- State: LiteLLM ↔ PostgreSQL (spend tracking, keys)
2. Assets
2.1 Critical Assets
| Asset |
Classification |
Location |
Owner |
| API Provider Keys |
SECRET |
Vault |
Platform Team |
| User API Keys |
SECRET |
PostgreSQL |
Platform Team |
| LLM Request/Response Data |
CONFIDENTIAL |
In-transit, Redis cache |
Data Owner |
| Cost/Spend Data |
INTERNAL |
PostgreSQL |
FinOps Team |
| Cedar Policies |
INTERNAL |
ConfigMap/Vault |
Security Team |
| TLS Certificates |
SECRET |
Cert-Manager/Vault |
Platform Team |
2.2 Data Classification
- SECRET: Credentials, tokens, private keys - encrypted at rest, audit logged
- CONFIDENTIAL: PII, business-sensitive prompts/responses - encrypted, access-controlled
- INTERNAL: Operational data, metrics - access-controlled
- PUBLIC: Health endpoints, public documentation
3. Threat Actors
| Actor |
Motivation |
Capability |
Likelihood |
| External Attacker |
Data theft, service disruption |
Medium-High |
Medium |
| Malicious Insider |
Data exfiltration, sabotage |
High |
Low |
| Compromised Agent |
Lateral movement, data access |
Medium |
Medium |
| Supply Chain |
Backdoor, credential theft |
High |
Low |
4. Threats (STRIDE Analysis)
4.1 Spoofing
| ID |
Threat |
Component |
Mitigation |
Priority |
| S1 |
API key impersonation |
LiteLLM |
Key rotation, IP allowlisting, anomaly detection |
HIGH |
| S2 |
Service identity spoofing |
Agent Gateway |
mTLS between services, SPIFFE/SPIRE |
HIGH |
| S3 |
JWT token forgery |
Agent Gateway |
RS256 signing, short expiry, token binding |
MEDIUM |
| S4 |
MCP server impersonation |
MCP Gateway |
Server authentication, allowlisting |
MEDIUM |
4.2 Tampering
| ID |
Threat |
Component |
Mitigation |
Priority |
| T1 |
Prompt injection via API |
LiteLLM/Agent GW |
Input validation, content filtering |
HIGH |
| T2 |
Response manipulation |
Agent Gateway |
Response signing, integrity checks |
MEDIUM |
| T3 |
Configuration tampering |
Kubernetes |
GitOps, admission controllers, RBAC |
HIGH |
| T4 |
Database record modification |
PostgreSQL |
Audit logging, row-level security |
MEDIUM |
4.3 Repudiation
| ID |
Threat |
Component |
Mitigation |
Priority |
| R1 |
Denied API usage |
LiteLLM |
Immutable audit logs, request signing |
HIGH |
| R2 |
Cost attribution disputes |
FinOps Reporter |
Tamper-evident spend logs |
MEDIUM |
| R3 |
Policy change denial |
Vault/Cedar |
Git-backed policies, change audit |
MEDIUM |
| ID |
Threat |
Component |
Mitigation |
Priority |
| I1 |
API key exposure in logs |
All |
Log scrubbing, secret detection |
CRITICAL |
| I2 |
Prompt/response leakage |
Redis cache |
Encryption at rest, short TTL |
HIGH |
| I3 |
Model output exfiltration |
Agent Gateway |
DLP policies, output filtering |
HIGH |
| I4 |
Side-channel via timing |
vLLM |
Request padding, rate limiting |
LOW |
4.5 Denial of Service
| ID |
Threat |
Component |
Mitigation |
Priority |
| D1 |
Token exhaustion attack |
LiteLLM |
Budget limits, rate limiting |
HIGH |
| D2 |
Connection pool exhaustion |
Agent Gateway |
Connection limits, circuit breakers |
HIGH |
| D3 |
GPU memory exhaustion |
vLLM |
Request queuing, memory limits |
MEDIUM |
| D4 |
Cache poisoning |
Redis |
Authentication, input validation |
MEDIUM |
4.6 Elevation of Privilege
| ID |
Threat |
Component |
Mitigation |
Priority |
| E1 |
Cross-tenant data access |
LiteLLM |
Team isolation, Cedar policies |
CRITICAL |
| E2 |
Vault token escalation |
Vault |
Least-privilege policies, token TTL |
HIGH |
| E3 |
Container escape |
All Pods |
Seccomp, AppArmor, non-root |
HIGH |
| E4 |
RBAC bypass |
Kubernetes |
Audit logging, admission webhooks |
HIGH |
5. Attack Trees
5.1 Credential Theft Attack Tree
Goal: Steal API Provider Keys
├── 1. Extract from Vault [CRITICAL]
│ ├── 1.1 Compromise Vault token
│ │ ├── 1.1.1 Steal from pod environment
│ │ ├── 1.1.2 Intercept token renewal
│ │ └── 1.1.3 Exploit Vault vulnerability
│ └── 1.2 Access Vault storage backend
│ └── 1.2.1 Compromise etcd/consul
├── 2. Extract from Memory [HIGH]
│ ├── 2.1 Container escape + memory dump
│ └── 2.2 Core dump analysis
├── 3. Extract from Logs [MEDIUM]
│ ├── 3.1 Access application logs
│ └── 3.2 Access OTel traces
└── 4. Man-in-the-Middle [MEDIUM]
├── 4.1 Compromise service mesh
└── 4.2 DNS hijacking
5.2 Budget Bypass Attack Tree
Goal: Exceed Budget Without Detection
├── 1. Direct API Access [HIGH]
│ ├── 1.1 Bypass LiteLLM proxy
│ │ └── 1.1.1 Direct vLLM access
│ └── 1.2 Use stolen provider key
├── 2. Attribution Evasion [MEDIUM]
│ ├── 2.1 Spoof user/team headers
│ └── 2.2 Use shared/default key
└── 3. Exploit Async Tracking [MEDIUM]
├── 3.1 Race condition in spend update
└── 3.2 Overwhelm tracking pipeline
6. Security Controls
6.1 Preventive Controls
| Control |
Threats Mitigated |
Implementation |
| mTLS |
S2, I1 |
Istio/Linkerd service mesh |
| Cedar RBAC |
E1, E4 |
Agent Gateway policies |
| Network Policies |
D1, E1 |
Kubernetes NetworkPolicy |
| Pod Security Standards |
E3 |
Restricted PSS profile |
| Input Validation |
T1 |
LiteLLM guardrails |
| Budget Enforcement |
D1 |
LiteLLM + Budget Webhook |
6.2 Detective Controls
| Control |
Threats Detected |
Implementation |
| Audit Logging |
R1, R2, R3 |
OTel → Loki/Splunk |
| Anomaly Detection |
S1, D1 |
Prometheus alerts |
| Secret Scanning |
I1 |
Trivy, Gitleaks |
| Runtime Security |
E3, T3 |
Falco |
6.3 Corrective Controls
| Control |
Response |
Implementation |
| Key Rotation |
Credential compromise |
Vault auto-rotate |
| Circuit Breaker |
Service degradation |
Agent Gateway |
| Auto-scaling |
Load spike |
KEDA + HPA |
| Incident Runbooks |
Security events |
PagerDuty integration |
7. Risk Register
| Risk ID |
Description |
Likelihood |
Impact |
Risk Level |
Mitigation Status |
| R-001 |
API key exposure via logs |
Medium |
Critical |
HIGH |
Mitigated (log scrubbing) |
| R-002 |
Cross-tenant data access |
Low |
Critical |
MEDIUM |
Mitigated (Cedar policies) |
| R-003 |
Budget bypass via direct access |
Medium |
High |
HIGH |
Mitigated (NetworkPolicy) |
| R-004 |
Prompt injection attacks |
High |
Medium |
HIGH |
Mitigated (LLM Guard + Presidio) |
| R-005 |
Supply chain compromise |
Low |
Critical |
MEDIUM |
Mitigated (image signing) |
8. Compliance Mapping
| Requirement |
Control |
Evidence |
| SOC2 CC6.1 |
Access control |
Cedar policies, Vault ACLs |
| SOC2 CC6.6 |
Encryption |
TLS 1.3, Vault encryption |
| SOC2 CC7.2 |
Monitoring |
OTel traces, Prometheus alerts |
| GDPR Art. 32 |
Data protection |
Encryption, access logging |
| PCI-DSS 3.4 |
Key management |
Vault, rotation policies |
9. Review and Updates
9.1 Review Triggers
- Quarterly scheduled review
- New component addition
- Security incident
- Significant architecture change
- Compliance audit finding
9.2 Change Log
| Date |
Version |
Author |
Changes |
| 2026-02 |
1.0 |
Platform Team |
Initial threat model |
Appendix A: DREAD Scoring
| Threat |
Damage |
Reproducibility |
Exploitability |
Affected Users |
Discoverability |
Score |
| I1 - Key exposure |
10 |
8 |
6 |
10 |
7 |
8.2 |
| E1 - Cross-tenant |
10 |
5 |
4 |
8 |
3 |
6.0 |
| D1 - Token exhaust |
6 |
9 |
8 |
10 |
8 |
8.2 |
| T1 - Prompt inject |
7 |
8 |
7 |
6 |
9 |
7.4 |