Burn sits between your app and any LLM. It classifies every prompt, picks the cheapest model that can handle it, and logs exactly what you spent. Cloud or local. Always optimal.
Burn reads your prompt and classifies it: SIMPLE, MEDIUM, COMPLEX, or CRITICAL. No guessing.
Based on the tier, Burn picks the optimal model. Simple tasks go to cheap fast models. Complex reasoning goes to frontier models.
Every call is logged to SQLite. See exactly what you spent, what you saved, and where your budget is going.
Four tiers: SIMPLE (fast/cheap), MEDIUM (balanced), COMPLEX (capable), CRITICAL (best available). Burn classifies in milliseconds.
Route to any cloud model via OpenRouter, or run locally with Ollama and LM Studio. One interface, every model.
Live savings banner. Actual spend. Request count. Routing distribution by model. Cost by task type. All from your SQLite DB.
Route simple tasks to Ollama or LM Studio — completely free. Burn tracks local calls too so you see the full picture.
| Tier | Model | Provider | Cost | Best For |
|---|---|---|---|---|
| Budget | llama-3.3-70b | OpenRouter | $0.10/1M | Simple tasks, summaries |
| Standard | deepseek-v3 | OpenRouter | $0.27/1M | Code, analysis |
| Standard | claude-haiku | Anthropic | $0.80/1M | Balanced tasks |
| Premium | claude-sonnet | Anthropic | $3/1M | Complex reasoning |
| Premium | gpt-4o | OpenAI | $5/1M | Multimodal tasks |
| Elite | claude-opus | Anthropic | $15/1M | Critical, highest stakes |
| Local | Ollama / LM Studio | Local | Free | Any task, private |