AI Control Plane Platform - Threat Model#

Document Information#

Field	Value
Version	1.0
Last Updated	2026-02
Classification	Internal
Review Cycle	Quarterly

1. System Overview#

1.1 Architecture Context#

┌─────────────────────────────────────────────────────────────────────────────┐
│                              TRUST BOUNDARY: CLUSTER                         │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                        TRUST BOUNDARY: GATEWAY                          ││
│  │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              ││
│  │  │   LiteLLM    │───▶│Agent Gateway │───▶│    vLLM      │              ││
│  │  │   (L7 Proxy) │    │ (Data Plane) │    │  (Inference) │              ││
│  │  └──────────────┘    └──────────────┘    └──────────────┘              ││
│  │         │                   │                   │                       ││
│  │         ▼                   ▼                   ▼                       ││
│  │  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              ││
│  │  │  PostgreSQL  │    │    Vault     │    │    Redis     │              ││
│  │  │   (State)    │    │  (Secrets)   │    │   (Cache)    │              ││
│  │  └──────────────┘    └──────────────┘    └──────────────┘              ││
│  └─────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────┘
         ▲                                                    │
         │                                                    ▼
    ┌─────────┐                                    ┌──────────────────┐
    │ Clients │                                    │ External LLM APIs│
    │(Agents) │                                    │ (OpenAI/Anthropic)│
    └─────────┘                                    └──────────────────┘

1.2 Data Flow Summary#

Inbound: Client → Ingress → LiteLLM → Agent Gateway → Backend (vLLM/External)
Secrets: Vault → Agent Gateway → External API calls
Telemetry: All components → OTel Collector → Prometheus/Jaeger
State: LiteLLM ↔ PostgreSQL (spend tracking, keys)

2. Assets#

2.1 Critical Assets#

Asset	Classification	Location	Owner
API Provider Keys	SECRET	Vault	Platform Team
User API Keys	SECRET	PostgreSQL	Platform Team
LLM Request/Response Data	CONFIDENTIAL	In-transit, Redis cache	Data Owner
Cost/Spend Data	INTERNAL	PostgreSQL	FinOps Team
Cedar Policies	INTERNAL	ConfigMap/Vault	Security Team
TLS Certificates	SECRET	Cert-Manager/Vault	Platform Team

2.2 Data Classification#

SECRET: Credentials, tokens, private keys - encrypted at rest, audit logged
CONFIDENTIAL: PII, business-sensitive prompts/responses - encrypted, access-controlled
INTERNAL: Operational data, metrics - access-controlled
PUBLIC: Health endpoints, public documentation

3. Threat Actors#

Actor	Motivation	Capability	Likelihood
External Attacker	Data theft, service disruption	Medium-High	Medium
Malicious Insider	Data exfiltration, sabotage	High	Low
Compromised Agent	Lateral movement, data access	Medium	Medium
Supply Chain	Backdoor, credential theft	High	Low

4. Threats (STRIDE Analysis)#

4.1 Spoofing#

ID	Threat	Component	Mitigation	Priority
S1	API key impersonation	LiteLLM	Key rotation, IP allowlisting, anomaly detection	HIGH
S2	Service identity spoofing	Agent Gateway	mTLS between services, SPIFFE/SPIRE	HIGH
S3	JWT token forgery	Agent Gateway	RS256 signing, short expiry, token binding	MEDIUM
S4	MCP server impersonation	MCP Gateway	Server authentication, allowlisting	MEDIUM

4.2 Tampering#

ID	Threat	Component	Mitigation	Priority
T1	Prompt injection via API	LiteLLM/Agent GW	Input validation, content filtering	HIGH
T2	Response manipulation	Agent Gateway	Response signing, integrity checks	MEDIUM
T3	Configuration tampering	Kubernetes	GitOps, admission controllers, RBAC	HIGH
T4	Database record modification	PostgreSQL	Audit logging, row-level security	MEDIUM

4.3 Repudiation#

ID	Threat	Component	Mitigation	Priority
R1	Denied API usage	LiteLLM	Immutable audit logs, request signing	HIGH
R2	Cost attribution disputes	FinOps Reporter	Tamper-evident spend logs	MEDIUM
R3	Policy change denial	Vault/Cedar	Git-backed policies, change audit	MEDIUM

4.4 Information Disclosure#

ID	Threat	Component	Mitigation	Priority
I1	API key exposure in logs	All	Log scrubbing, secret detection	CRITICAL
I2	Prompt/response leakage	Redis cache	Encryption at rest, short TTL	HIGH
I3	Model output exfiltration	Agent Gateway	DLP policies, output filtering	HIGH
I4	Side-channel via timing	vLLM	Request padding, rate limiting	LOW

4.5 Denial of Service#

ID	Threat	Component	Mitigation	Priority
D1	Token exhaustion attack	LiteLLM	Budget limits, rate limiting	HIGH
D2	Connection pool exhaustion	Agent Gateway	Connection limits, circuit breakers	HIGH
D3	GPU memory exhaustion	vLLM	Request queuing, memory limits	MEDIUM
D4	Cache poisoning	Redis	Authentication, input validation	MEDIUM

4.6 Elevation of Privilege#

ID	Threat	Component	Mitigation	Priority
E1	Cross-tenant data access	LiteLLM	Team isolation, Cedar policies	CRITICAL
E2	Vault token escalation	Vault	Least-privilege policies, token TTL	HIGH
E3	Container escape	All Pods	Seccomp, AppArmor, non-root	HIGH
E4	RBAC bypass	Kubernetes	Audit logging, admission webhooks	HIGH

5. Attack Trees#

5.1 Credential Theft Attack Tree#

Goal: Steal API Provider Keys
├── 1. Extract from Vault [CRITICAL]
│   ├── 1.1 Compromise Vault token
│   │   ├── 1.1.1 Steal from pod environment
│   │   ├── 1.1.2 Intercept token renewal
│   │   └── 1.1.3 Exploit Vault vulnerability
│   └── 1.2 Access Vault storage backend
│       └── 1.2.1 Compromise etcd/consul
├── 2. Extract from Memory [HIGH]
│   ├── 2.1 Container escape + memory dump
│   └── 2.2 Core dump analysis
├── 3. Extract from Logs [MEDIUM]
│   ├── 3.1 Access application logs
│   └── 3.2 Access OTel traces
└── 4. Man-in-the-Middle [MEDIUM]
    ├── 4.1 Compromise service mesh
    └── 4.2 DNS hijacking

5.2 Budget Bypass Attack Tree#

Goal: Exceed Budget Without Detection
├── 1. Direct API Access [HIGH]
│   ├── 1.1 Bypass LiteLLM proxy
│   │   └── 1.1.1 Direct vLLM access
│   └── 1.2 Use stolen provider key
├── 2. Attribution Evasion [MEDIUM]
│   ├── 2.1 Spoof user/team headers
│   └── 2.2 Use shared/default key
└── 3. Exploit Async Tracking [MEDIUM]
    ├── 3.1 Race condition in spend update
    └── 3.2 Overwhelm tracking pipeline

6. Security Controls#

6.1 Preventive Controls#

Control	Threats Mitigated	Implementation
mTLS	S2, I1	Istio/Linkerd service mesh
Cedar RBAC	E1, E4	Agent Gateway policies
Network Policies	D1, E1	Kubernetes NetworkPolicy
Pod Security Standards	E3	Restricted PSS profile
Input Validation	T1	LiteLLM guardrails
Budget Enforcement	D1	LiteLLM + Budget Webhook

6.2 Detective Controls#

Control	Threats Detected	Implementation
Audit Logging	R1, R2, R3	OTel → Loki/Splunk
Anomaly Detection	S1, D1	Prometheus alerts
Secret Scanning	I1	Trivy, Gitleaks
Runtime Security	E3, T3	Falco

6.3 Corrective Controls#

Control	Response	Implementation
Key Rotation	Credential compromise	Vault auto-rotate
Circuit Breaker	Service degradation	Agent Gateway
Auto-scaling	Load spike	KEDA + HPA
Incident Runbooks	Security events	PagerDuty integration

7. Risk Register#

Risk ID	Description	Likelihood	Impact	Risk Level	Mitigation Status
R-001	API key exposure via logs	Medium	Critical	HIGH	Mitigated (log scrubbing)
R-002	Cross-tenant data access	Low	Critical	MEDIUM	Mitigated (Cedar policies)
R-003	Budget bypass via direct access	Medium	High	HIGH	Mitigated (NetworkPolicy)
R-004	Prompt injection attacks	High	Medium	HIGH	Mitigated (LLM Guard + Presidio)
R-005	Supply chain compromise	Low	Critical	MEDIUM	Mitigated (image signing)

8. Compliance Mapping#

Requirement	Control	Evidence
SOC2 CC6.1	Access control	Cedar policies, Vault ACLs
SOC2 CC6.6	Encryption	TLS 1.3, Vault encryption
SOC2 CC7.2	Monitoring	OTel traces, Prometheus alerts
GDPR Art. 32	Data protection	Encryption, access logging
PCI-DSS 3.4	Key management	Vault, rotation policies

9. Review and Updates#

9.1 Review Triggers#

Quarterly scheduled review
New component addition
Security incident
Significant architecture change
Compliance audit finding

9.2 Change Log#

Date	Version	Author	Changes
2026-02	1.0	Platform Team	Initial threat model

Appendix A: DREAD Scoring#

Threat	Damage	Reproducibility	Exploitability	Affected Users	Discoverability	Score
I1 - Key exposure	10	8	6	10	7	8.2
E1 - Cross-tenant	10	5	4	8	3	6.0
D1 - Token exhaust	6	9	8	10	8	8.2
T1 - Prompt inject	7	8	7	6	9	7.4