FinOps KPI Definitions#
Document Information#
| Field | Value |
|---|---|
| Version | 1.0 |
| Last Updated | 2026-02 |
| Owner | FinOps Team |
| Review Cycle | Monthly |
Overview#
This document defines the Key Performance Indicators (KPIs) for AI Control Plane Platform cost management. These metrics enable teams to understand, optimize, and govern LLM spending.
1. Cost Efficiency KPIs#
1.1 Cost Per Request (CPR)#
Definition: Average cost incurred per API request to the gateway.
Prometheus Query:
| Target | Good | Warning | Critical |
|---|---|---|---|
| < $0.01 | < $0.005 | $0.01-0.05 | > $0.05 |
Use Cases: - Identify expensive usage patterns - Compare efficiency across teams - Track optimization improvements
1.2 Cost Per 1K Tokens (CPT)#
Definition: Average cost per 1,000 tokens processed (input + output combined).
Prometheus Query:
| Model Tier | Expected CPT |
|---|---|
| Budget (Haiku, GPT-4o-mini) | $0.0002 - $0.001 |
| Standard (Sonnet, GPT-4o) | $0.002 - $0.01 |
| Premium (Opus, GPT-4-turbo) | $0.01 - $0.03 |
| Self-hosted (Llama) | $0.00005 - $0.0003 |
Use Cases: - Model cost comparison - Vendor negotiation benchmarks - Self-hosted vs API cost analysis
1.3 Output/Input Token Ratio#
Definition: Ratio of output tokens to input tokens, indicating response verbosity.
Prometheus Query:
sum(increase(litellm_tokens_total{type="output"}[24h]))
/ sum(increase(litellm_tokens_total{type="input"}[24h]))
| Use Case | Expected Ratio |
|---|---|
| Chat/Q&A | 0.5 - 2.0 |
| Code Generation | 2.0 - 5.0 |
| Summarization | 0.1 - 0.3 |
| Translation | 0.8 - 1.2 |
Use Cases: - Identify verbose prompts/responses - Optimize prompt engineering - Detect potential misuse
1.4 Cache Hit Rate#
Definition: Percentage of requests served from cache vs. calling the LLM provider.
Prometheus Query:
| Target | Good | Needs Improvement |
|---|---|---|
| > 20% | > 30% | < 10% |
Use Cases: - Validate caching effectiveness - Identify cacheable workloads - Reduce redundant API calls
2. Budget Management KPIs#
2.1 Budget Utilization Rate#
Definition: Percentage of allocated budget consumed within the period.
Prometheus Query:
# By team
sum(litellm_team_spend_total) by (team) / sum(litellm_team_budget_total) by (team)
# Global
sum(litellm_spend_total) / sum(litellm_global_budget_total)
| Status | Utilization |
|---|---|
| π’ Healthy | < 70% |
| π‘ Warning | 70-90% |
| π΄ Critical | > 90% |
Alerting Thresholds: - Warning at 80% - Critical at 95%
2.2 Budget Burn Rate#
Definition: Rate at which budget is being consumed, used to project exhaustion date.
Prometheus Query:
# Daily burn rate
sum(increase(litellm_spend_total[24h]))
# Days until budget exhaustion
(sum(litellm_team_budget_total) - sum(litellm_team_spend_total))
/ sum(increase(litellm_spend_total[24h]))
| Projection | Action |
|---|---|
| > 30 days remaining | Normal |
| 15-30 days | Review usage |
| < 15 days | Urgent optimization needed |
2.3 Budget Variance#
Definition: Difference between planned (forecasted) and actual spend.
| Variance | Assessment |
|---|---|
| Β±10% | On track |
| +10-25% | Over budget |
| +25%+ | Significant overrun |
| -25%+ | Under-utilized |
Use Cases: - Monthly/quarterly financial reporting - Forecast accuracy improvement - Resource allocation decisions
3. Usage Distribution KPIs#
3.1 Model Mix Distribution#
Definition: Percentage breakdown of requests/spend by model.
Prometheus Query:
# Spend distribution
sum(increase(litellm_spend_total[24h])) by (model)
/ sum(increase(litellm_spend_total[24h]))
# Request distribution
sum(increase(litellm_requests_total[24h])) by (model)
/ sum(increase(litellm_requests_total[24h]))
Healthy Distribution Example: | Model Tier | Request % | Spend % | |------------|-----------|---------| | Budget | 60-70% | 10-20% | | Standard | 25-35% | 40-50% | | Premium | 5-10% | 30-40% |
3.2 Team Cost Allocation#
Definition: Cost distribution across teams/departments.
Prometheus Query:
Reporting Format: | Team | Monthly Spend | % of Total | Budget | Utilization | |------|---------------|------------|--------|-------------| | Engineering | $X,XXX | XX% | $X,XXX | XX% | | Data Science | $X,XXX | XX% | $X,XXX | XX% | | Product | $X,XXX | XX% | $X,XXX | XX% |
3.3 Provider Distribution#
Definition: Spend breakdown by LLM provider (OpenAI, Anthropic, XAI, Self-hosted).
Prometheus Query:
Strategic Targets: | Provider | Target % | Rationale | |----------|----------|-----------| | Self-hosted | 40-60% | Cost control, data privacy | | OpenAI | 20-30% | Specific capabilities | | Anthropic | 15-25% | Quality, safety | | XAI | 5-10% | Diversity, fallback |
4. Operational Efficiency KPIs#
4.1 Request Success Rate#
Definition: Percentage of requests that complete successfully (non-error).
Prometheus Query:
| SLO Target | Acceptable | Degraded |
|---|---|---|
| 99.9% | > 99.5% | < 99% |
4.2 Retry Rate#
Definition: Percentage of requests that required retries.
Prometheus Query:
| Status | Retry Rate |
|---|---|
| π’ Healthy | < 5% |
| π‘ Elevated | 5-15% |
| π΄ High | > 15% |
Cost Impact: Each retry adds ~100% cost for that request.
4.3 Fallback Rate#
Definition: Percentage of requests routed to fallback models.
Prometheus Query:
| Healthy | Warning |
|---|---|
| < 2% | > 5% |
Use Cases: - Provider reliability assessment - Capacity planning - Cost impact analysis (fallbacks may cost more/less)
5. Unit Economics KPIs#
5.1 Cost Per User#
Definition: Average cost per active user per period.
Prometheus Query:
Benchmarks by User Type: | User Type | Monthly Cost | |-----------|--------------| | Light (< 100 req/day) | $5-20 | | Moderate (100-1000 req/day) | $20-100 | | Heavy (> 1000 req/day) | $100-500 | | Power (automated/batch) | $500+ |
5.2 Self-Hosted Cost Ratio#
Definition: Cost savings achieved by using self-hosted models vs. equivalent API calls.
Calculation Components: - Infrastructure cost (GPU, compute, storage) - Operational cost (team time, maintenance) - Equivalent API cost at same token volume
Target: > 50% savings at scale
5.3 Cost Per Business Outcome#
Definition: Cost attributed to specific business outcomes (varies by use case).
| Use Case | Metric | Target CPO |
|---|---|---|
| Customer Support | Cost per ticket resolved | < $0.50 |
| Code Generation | Cost per PR assisted | < $2.00 |
| Content Creation | Cost per article | < $1.00 |
| Data Analysis | Cost per report | < $5.00 |
6. Reporting Schedule#
6.1 Real-time Dashboard#
- All KPIs updated every 30 seconds
- Alert thresholds monitored continuously
6.2 Daily Report#
- Cost Per Request trend
- Budget utilization by team
- Anomaly highlights
- Delivery: Slack #finops-daily at 9 AM IST
6.3 Weekly Report#
- Week-over-week cost comparison
- Model mix analysis
- Top 10 spenders
- Optimization recommendations
- Delivery: Email to stakeholders, Monday 10 AM IST
6.4 Monthly Report#
- Budget variance analysis
- Provider distribution trends
- Unit economics deep dive
- Forecast vs actual
- Chargeback data for finance
- Delivery: Confluence + email, 3rd business day
7. Grafana Dashboard Panels#
Recommended Dashboard Layout#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI Control Plane FinOps Dashboard β
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββββββ€
β Total Spend β Cost/Requestβ Budget Used β Cache Hit % β Burn Rate β
β (24h) β (avg) β (month) β β (days remaining) β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββββββ€
β Hourly Spend by Model (stacked bar) β
βββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ€
β Cost Distribution by Team (pie) β Budget Utilization (gauges) β
βββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β Provider Spend Breakdown (pie) β Cost Per 1K Tokens (line) β
βββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ€
β Token Usage: Input vs Output (timeseries) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Top 10 Spenders Table β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
8. Optimization Recommendations#
Based on KPI Analysis#
| KPI Signal | Recommendation |
|---|---|
| High CPR | Review prompt length, enable caching |
| Low cache hit rate | Identify repeated queries, tune TTL |
| High token ratio | Optimize prompts, set max_tokens |
| Budget > 80% | Alert team, review heavy users |
| High retry rate | Check provider health, adjust timeouts |
| Premium model > 30% spend | Evaluate if premium needed, try cheaper alternatives |
| Self-hosted < 40% | Migrate suitable workloads to vLLM |
Appendix A: Prometheus Recording Rules#
groups:
- name: finops-recording-rules
interval: 1m
rules:
- record: aigateway:cost_per_request:1h
expr: sum(increase(litellm_spend_total[1h])) / sum(increase(litellm_requests_total[1h]))
- record: aigateway:cost_per_1k_tokens:1h
expr: sum(increase(litellm_spend_total[1h])) / (sum(increase(litellm_tokens_total[1h])) / 1000)
- record: aigateway:budget_utilization:team
expr: sum(litellm_team_spend_total) by (team) / sum(litellm_team_budget_total) by (team)
- record: aigateway:daily_spend:model
expr: sum(increase(litellm_spend_total[24h])) by (model)
- record: aigateway:cache_hit_rate:1h
expr: sum(rate(litellm_cache_hits_total[1h])) / sum(rate(litellm_requests_total[1h]))
- record: aigateway:token_ratio:1h
expr: sum(rate(litellm_tokens_total{type="output"}[1h])) / sum(rate(litellm_tokens_total{type="input"}[1h]))
Appendix B: SQL Queries for FinOps Reporter#
-- Daily cost by team
SELECT
date,
team_id,
SUM(total_cost) as daily_cost,
SUM(request_count) as requests,
SUM(total_cost) / NULLIF(SUM(request_count), 0) as cost_per_request
FROM cost_tracking_daily
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date, team_id
ORDER BY date DESC, daily_cost DESC;
-- Model efficiency comparison
SELECT
model,
SUM(total_cost) as total_cost,
SUM(input_tokens + output_tokens) as total_tokens,
SUM(total_cost) / NULLIF(SUM(input_tokens + output_tokens), 0) * 1000 as cost_per_1k_tokens
FROM cost_tracking_daily
WHERE date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY model
ORDER BY cost_per_1k_tokens;
-- Budget burn rate projection
WITH daily_spend AS (
SELECT
team_id,
AVG(total_cost) as avg_daily_spend
FROM cost_tracking_daily
WHERE date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY team_id
)
SELECT
team_id,
avg_daily_spend,
budget_limit,
(budget_limit - current_spend) / NULLIF(avg_daily_spend, 0) as days_remaining
FROM daily_spend
JOIN team_budgets USING (team_id);