You can run OpenClaw agents for $10-25/month instead of $200-5,000/month by choosing the right model for each task. GPT-4o mini ($0.15/M input) and DeepSeek ($0.28/M input) handle simple tasks at a fraction of the cost of Claude Opus ($5/M input). The key is routing: use cheap models for heartbeats and boilerplate, expensive models only for reasoning. ClawCap automates this at the proxy layer.
The single biggest variable in your OpenClaw bill is not how many hours your agent runs. It is which model processes each request. The spread between the cheapest and most expensive viable models in early 2026 is over 30x on input tokens and 40x on output tokens.
Sending the same prompt to GPT-4o mini versus Claude Opus costs you $0.15 versus $5 per million input tokens. If your agent is sending 50 million input tokens per month — a realistic number for active development — that is the difference between $7.50 and $250 just on input.
Here is every model worth considering for OpenClaw agents, sorted by input cost. All prices are per million tokens, current as of March 2026.
| Model | Input / M | Output / M | Provider |
|---|---|---|---|
| GPT-4o mini | $0.15 | $0.60 | OpenAI |
| MiniMax | $0.27 | $1.00 | MiniMax |
| DeepSeek Chat | $0.28 | $0.42 | DeepSeek |
| Gemini 2.5 Flash | $0.30 | $2.50 | Google (free tier available) |
| Mistral Large | $0.50 | $1.50 | Mistral |
| Groq (Llama 3.3 70B) | $0.59 | $0.79 | Groq |
| Kimi K2.5 | $0.60 | $3.00 | Moonshot |
| GLM-4.7 | $0.60 | $2.20 | Zhipu AI |
| GPT-4o | $2.50 | $10.00 | OpenAI |
| Claude 4 Sonnet | $3.00 | $15.00 | Anthropic |
| Grok 3 | $3.00 | $15.00 | xAI |
| Claude 4 Opus | $5.00 | $25.00 | Anthropic |
Look at the gap between the top and bottom of that table. Claude Opus output tokens cost 42x more than DeepSeek output tokens. Even Sonnet output costs 36x more than DeepSeek. These ratios determine whether your monthly bill is a rounding error or a rent payment.
Across developer communities and forums in early 2026, self-reported OpenClaw spending falls into three distinct clusters depending on model choice and discipline.
The difference between Cluster 1 and Cluster 3 is not productivity. Studies of agent-assisted development show diminishing returns past 4-6 hours of active agent use per day. The difference is waste — loops, heartbeats, and using expensive models for tasks that do not need them.
Not every API call from an OpenClaw agent is a complex reasoning task. In fact, based on request log analysis, roughly 60-70% of calls from a typical coding agent are simple operations that any model can handle. Here is how tasks break down by model requirement.
Tasks that benefit from Claude Opus / GPT-4o ($2.50-75/M):
Tasks that work fine on Claude Sonnet / Mistral Large ($2-6/M):
Tasks that any cheap model handles ($0.15-0.60/M):
That bottom category — the cheap stuff — accounts for the majority of API calls by volume. Every heartbeat ping, every "read this file and tell me what is on line 47," every status check. These are the calls that drain your budget silently when routed to an expensive model.
Let's do the math on heartbeat waste specifically, because it is the most common source of invisible cost.
An active OpenClaw agent sends periodic calls to maintain context and check status. These are typically short — 200-500 input tokens, 50-100 output tokens. A busy agent might send 10-20 of these per minute during active sessions.
Assume 15 heartbeat calls per minute, averaging 350 input tokens and 75 output tokens each. Over an 8-hour working day, that is 7,200 calls. The token totals: 2.52M input tokens and 540K output tokens per day, just from heartbeats.
Rerouting heartbeats from Opus to DeepSeek saves $25.16 per day. Over a month of weekday development, that is $553 in savings from a single optimization. Rerouting Sonnet heartbeats to GPT-4o mini saves $14.96/day or $329/month.
This is not theoretical. These are the exact calculations ClawCap performs when it detects heartbeat patterns and reroutes them to the cheapest configured model.
ClawCap sits as a proxy between OpenClaw and every API provider. It sees every request before it leaves your machine. The heartbeat detection system looks at three signals.
First, frequency. If an agent is making calls more often than once every 3 seconds with payloads under 500 tokens, it flags the pattern. Second, content similarity. If the last 5 calls have a cosine similarity above 0.85 on their input content, they are likely heartbeats. Third, response utility. If the responses are consistently short (under 100 tokens) and do not contain code blocks or structured output, the calls are probably status checks.
When a request matches the heartbeat pattern, ClawCap rewrites the model field in the API request to the cheapest available provider before forwarding. The agent does not know the difference. It sends a request for claude-sonnet-4-6, but ClawCap routes it to DeepSeek. The response format is normalized so OpenClaw processes it identically.
This is transparent to the agent. It keeps working. Your bill drops.
Based on the pricing data above and real usage patterns, here is the configuration that minimizes cost while maintaining quality for coding agents.
Primary model: Claude Sonnet ($3/$15 per M). This handles 80% of actual coding tasks. It is fast, capable, and the sweet spot between cost and quality for code generation, review, and debugging.
Upgrade model: Claude Opus ($5/$25 per M). Reserved for tasks you explicitly flag as complex. Multi-file refactors, architectural decisions, difficult bugs. Use this maybe 5-10% of the time.
Heartbeat/simple model: GPT-4o mini ($0.15/$0.60 per M). All heartbeats, status checks, file reads, and simple operations get routed here automatically. This handles 40-60% of total API calls by volume but a tiny fraction of the cost.
Free tier option: Gemini 2.5 Flash. Google offers a free tier for Gemini Flash that can absorb a significant number of low-value calls. If you are budget-constrained, configure this as your heartbeat model before DeepSeek. The free tier has rate limits, but heartbeats are small enough to stay within them for most users.
With this setup and a $5/day hard cap in ClawCap, a typical developer's monthly bill looks like this:
Google's free tier for Gemini 2.5 Flash is the most underused option in the OpenClaw ecosystem right now. You get 10 requests per minute and 250 requests per day at no cost. After that, paid pricing kicks in at $0.30/M input.
For heartbeat routing, 1,500 free requests per day is often enough to cover an entire workday of status checks. If your agent sends 15 heartbeats per minute for 8 hours, that is 7,200 requests — you would exhaust the free tier in about 100 minutes and pay for the remaining 5,100 at $0.15/M. Still far cheaper than Sonnet.
The practical move is to stack providers: route heartbeats to Gemini Flash first, overflow to DeepSeek. Your first ~100 minutes of heartbeats each day cost literally nothing.
Cheap models are not free of tradeoffs. You should know what you are giving up.
Context window limits. DeepSeek Chat has a 64K context window versus Claude's 200K. For long conversations or large codebase analysis, you may hit the limit. Heartbeats are short, so this rarely matters for routed calls, but it matters if you try to use DeepSeek as your primary coding model.
Code quality on complex tasks. GLM-4.7 and MiniMax produce noticeably worse code on multi-step reasoning tasks compared to Sonnet. They are fine for "read this file and extract the function signature" but struggle with "refactor this module to use the strategy pattern." Do not use them for tasks above their capability.
Latency. Groq is extremely fast (sub-second for short responses) but limited to hosted models. DeepSeek can have variable latency during peak hours. For heartbeats, latency does not matter. For interactive coding where you are waiting for a response, an extra 2-3 seconds per call adds up.
Rate limits. Cheap providers often have tighter rate limits, especially on free tiers. If your agent spikes to 30 requests per minute during a complex task, you might hit rate limits on DeepSeek before you would on Anthropic. ClawCap handles this by falling back to the next cheapest provider when a rate limit is returned.
OpenClaw supports multiple provider configurations in ~/.openclaw/openclaw.json. You add each provider with its base URL and API key. When you point all base URLs to ClawCap's proxy at http://127.0.0.1:5858, the proxy handles routing decisions based on the model ID in each request.
ClawCap reads the model field, detects the provider (Anthropic, OpenAI, DeepSeek, etc.) via model name prefix, and forwards to the correct upstream API. It also has the ability to override the model field for heartbeat-detected requests, swapping the model before forwarding.
The net effect: your OpenClaw agent thinks it is always talking to one endpoint. ClawCap is silently routing each call to the right provider at the right price, enforcing your daily cap, and blocking loops before they drain your wallet.
Automatic heartbeat rerouting. Hard daily and monthly caps. Loop detection that stops $800 weekends before they start. Free tier included.
Get ClawCap