Cheapest AI Models for Hermes Agent in 2026 (Under $1/M Tokens)
8 affordable models for Hermes Agent — DeepSeek V4 Flash at $0.10/M tokens, MiMo V2.5, MiniMax M2.7, and more. Pricing benchmarks and which to pick for coding vs chat.
Hermes Agent runs 24/7. It answers messages, executes scheduled jobs, runs skills, and searches the web around the clock. That kind of usage adds up fast if you pick the wrong model. I have been testing different providers on my Hermes instance for months, and the open source landscape has changed a lot since my earlier model recommendations for OpenClaw.
I narrowed it down to eight models that work with Hermes Agent, cost a fraction of what Claude or GPT API access runs you, and in some cases match or beat those proprietary models on coding and agent benchmarks.
What this covers
- Eight affordable open source models that work well with Hermes Agent
- Per-token pricing, context windows, and coding benchmarks for each
- Which model is cheapest, which is strongest, and which sits in the middle
- OpenCode Go as a single subscription that bundles all of these models
- How to set each model in Hermes Agent
If you have not installed Hermes Agent yet, the setup guide walks through the full process. For dashboard options to manage your agent from a browser, see the best Hermes dashboards roundup. And if you want the built-in web UI, the Hermes dashboard guide covers SSH tunnels, Caddy, and Docker deployment.
The models at a glance
| Model | Input $/M tokens | Output $/M tokens | Context | Best For |
|---|---|---|---|---|
| DeepSeek V4 Flash | $0.098 | $0.28 | 1M | Ultra-cheap, fast tasks |
| MiMo V2.5 | $0.14 | $0.28 | 1M | Budget omnimodal |
| MiniMax M2.7 | $0.25 | $1.00 | 204K | Cheapest quality, daily use |
| DeepSeek V4 Pro | $0.435 | $0.87 | 1M | Long context on a budget |
| Kimi K2.6 | $0.67 | $3.39 | 1T MoE | Coding + agent swarm |
| MiMo V2.5 Pro | $0.43 | $0.87 | 1M | Strongest agent, long tasks |
| GLM 5.1 | $0.98 | $3.08 | 202K | Best overall coding |
| MiniMax M3 | $0.30 | $1.20 | 1M | Frontier coding + multimodal |
Bottom line
Cheapest: DeepSeek V4 Flash at $0.098/M input — under $5/month for 24/7 use. Best value for quality: MiniMax M3 at $0.30/M input with 1M context and 59% SWE-Bench Pro. Most powerful: GLM 5.1 and MiMo V2.5 Pro.
1. MiniMax M2.7 — The budget pick
This is the model I keep coming back to for everyday Hermes use. At $0.30 per million input tokens and $1.20 per million output tokens, running Hermes 24/7 costs roughly $7 to $15 per month depending on how much you use it. That is less than a coffee subscription.
MiniMax M2.7 (10% Off)What M2.7 delivers
M2.7 is no slouch for the price. It scores 56.2% on SWE-Bench Pro, which puts it in the same range as models that cost three to five times as much. On the GDPval-AA benchmark for economically valuable tasks, it hits ELO 1495, the highest score among open source models. Debugging, root cause analysis, document generation, multi-step tool calls — it handles all of those without falling apart.
MiniMax also offers M2.7-highspeed, which runs the same model at higher throughput for a slightly higher price. For interactive Hermes sessions where response time matters, it is worth trying.
| Spec | Value |
|---|---|
| Architecture | Mixture-of-Experts (MoE) |
| Context Window | 196K tokens |
| SWE-Bench Pro | 56.2% |
| GDPval-AA ELO | 1,495 |
| Input Cost | $0.30/M tokens |
| Output Cost | $1.20/M tokens |
| Cache Read | $0.059/M tokens |
Token Plan pricing
MiniMax offers a Token Plan with discounted rates. If you sign up through this link, you get 10% off the Token Plan.
Coding plan tip
The MiniMax Token Plan gives you a flat pool of tokens at a discount. For Hermes Agent, the base M2.7 plan covers most use cases. Subscribe through go.bitdoze.com/minimax for 10% off.
Setting M2.7 in Hermes
hermes config set model minimax/minimax-m2.7
Or set it through the model picker:
hermes model
Select MiniMax and authenticate with your API key.
2. DeepSeek V4 Pro — Long context, low price
DeepSeek V4 Pro gives you a 1 million token context window for $0.435 per million input tokens. Both the longest context and the second cheapest price on this list. If your Hermes conversations get long or you feed it large codebases, this is the model that handles it without losing track.
It runs 1.6 trillion total parameters with 49 billion activated per token and supports both thinking and non-thinking modes.
| Spec | Value |
|---|---|
| Architecture | MoE (1.6T total, 49B active) |
| Context Window | 1M tokens |
| AA Intelligence Index | 51.5 (better than 96% of models) |
| AA Agentic Index | 67.2 (better than 98% of models) |
| Input Cost | $0.435/M tokens |
| Output Cost | $0.87/M tokens |
| Cache Read | $0.003625/M tokens |
Where DeepSeek V4 Pro stands out
The hallucination rate on this model is 6.0% on the AA-Omniscience benchmark, the lowest on this list by far. When Hermes runs commands on a live server, that difference matters. It also scores 96.2% on tau2-Bench Telecom for conversational agent reliability.
Output cost is $0.87/M tokens, also the cheapest on this list. If your Hermes usage involves a lot of output — research summaries, code generation, document writing — DeepSeek V4 Pro keeps the bill down.
DeepSeek V4 Pro AnnouncementSetting DeepSeek V4 Pro in Hermes
hermes config set model deepseek/deepseek-v4-pro
Add your DeepSeek API key to ~/.hermes/.env:
echo "DEEPSEEK_API_KEY=your-key-here" >> ~/.hermes/.env
3. Kimi K2.6 — Agent swarm built in
Kimi K2.6 from Moonshot AI does something the other models on this list don’t: an agent swarm that spins up hundreds of parallel sub-agents to break down and tackle complex tasks on its own. You don’t have to decompose the work yourself — K2.6 figures it out.
| Spec | Value |
|---|---|
| Architecture | MoE (1T total, 32B active) |
| Context Window | 262K tokens |
| AA Intelligence Index | 53.9 (better than 98% of models) |
| AA Coding Index | 47.1 (better than 95% of models) |
| AA Agentic Index | 66.0 (better than 96% of models) |
| Input Cost | $0.75/M tokens |
| Output Cost | $3.50/M tokens |
Why K2.6 works for Hermes
K2.6 scores 91.1% on GPQA Diamond for graduate-level scientific reasoning — the highest on this list. It also handles Python, Rust, and Go coding across long-horizon tasks. The Agent Swarm feature means that when Hermes hits a complex task, K2.6 can internally decompose it and work on pieces in parallel.
Moonshot AI offers Kimi Code as a subscription service. Plans start at $15/month for the Moderato tier. If you use Hermes primarily for coding tasks, the Kimi Code subscription gives you a managed experience with K2.6 baked in.
Kimi K2.6 Model PageSetting K2.6 in Hermes
hermes config set model moonshotai/kimi-k2.6
Add your Moonshot API key:
echo "MOONSHOT_API_KEY=your-key-here" >> ~/.hermes/.env
4. Xiaomi MiMo V2.5 Pro — The agent powerhouse
MiMo V2.5 Pro is Xiaomi’s flagship model and one of the two strongest options on this list. It was built from the ground up for agent scenarios — complex software engineering, long-horizon tasks, and workflows that involve hundreds of tool calls in a single session.
During internal testing, MiMo V2.5 Pro completed a full SysY compiler in Rust in 4.3 hours with 672 tool calls, scoring a perfect 233/233 on the hidden test set. A task that takes undergraduate students at Peking University several weeks. It also built a working video editor web application — 8,192 lines of code across 1,868 tool invocations — in 11.5 hours of autonomous work.
MiMo V2.5 Pro Docs| Spec | Value |
|---|---|
| Context Window | 1M tokens |
| AA Intelligence Index | 53.8 (better than 98% of models) |
| AA Coding Index | 45.5 (better than 94% of models) |
| AA Agentic Index | 67.4 (better than 98% of models) |
| Input Cost (up to 256K) | $1.00/M tokens |
| Output Cost (up to 256K) | $3.00/M tokens |
| Input Cost (over 256K) | $2.00/M tokens |
| Output Cost (over 256K) | $6.00/M tokens |
| Cache Read | $0.20/M tokens |
Token efficiency advantage
MiMo V2.5 Pro is optimized for token efficiency. On the ClawEval agent benchmark, it achieves the same score as Kimi K2.6 while using 42% fewer tokens. That means the higher per-token price gets offset by needing fewer tokens to complete the same task.
The MiMo Token Plan starts at $72/year for the Lite tier (720 million credits). The Pro tier at $600/year gives 8.4 billion credits. Off-peak hours (16:00-24:00 UTC) get a 20% discount on top of the plan rate.
MiMo Token Plan ($2 Bonus)MiMo bonus
Sign up through go.bitdoze.com/mimo and get a $2 bonus credit on the MiMo Token Plan.
Setting MiMo V2.5 Pro in Hermes
hermes config set model xiaomi/mimo-v2.5-pro
Add your MiMo API key:
echo "MIMO_API_KEY=your-key-here" >> ~/.hermes/.env
MiMo V2.5 Pro is also available on OpenRouter, so if you already have Hermes configured with an OpenRouter key, you can select it from the model list without adding a new provider.
5. MiniMax M3 — Frontier coding with 1M context
MiniMax M3 is the latest flagship from MiniMax, released June 1, 2026. It is the first open-weight model to combine frontier coding, a 1-million-token context window, and native multimodality (image and video input). Built on MiniMax Sparse Attention (MSA), it cuts per-token compute at 1M context to one-twentieth of the prior M2.7 generation while running 9x faster prefill and 15x faster decoding. At the same $0.30/M input price as M2.7, M3 delivers significantly more capability.
MiniMax M3 (10% Off)| Spec | Value |
|---|---|
| Architecture | MiniMax Sparse Attention (MSA) |
| Context Window | 1M tokens |
| Max Output | 512K tokens |
| SWE-Bench Pro | 59.0% |
| Terminal-Bench 2.1 | 66.0% |
| BrowseComp | 83.5 |
| Multimodal | Native (image + video input) |
| Input Cost | $0.30/M tokens |
| Output Cost | $1.20/M tokens |
| Cache Read | $0.06/M tokens |
What M3 brings over M2.7
M3 keeps the same aggressive pricing as M2.7 but adds three things M2.7 never had:
- 1M context that actually works: MSA makes long-context affordable at 1/20 the compute cost of full attention. For Hermes conversations that span hours or involve large codebases, this matters.
- 59.0% SWE-Bench Pro: Beats GPT-5.5 and Gemini 3.1 Pro, approaches Claude Opus. M2.7 scored 56.2%.
- Native multimodality: Built-in image and video understanding, so Hermes can read screenshots, mockups, and documents without a separate vision model.
- 83.5 BrowseComp: Surpasses Opus 4.7’s 79.3 on web search and browsing tasks.
- 66.0% Terminal-Bench 2.1: Strong command-line agent performance for server tasks.
Long-horizon demonstrations
MiniMax backed M3’s launch with three autonomous task demonstrations:
- Paper reproduction: Autonomously reproduced an ICLR 2025 paper in 12 hours (18 commits, 23 figures)
- CUDA kernel optimization: Pushed FP8 hardware utilization from 7.6% to 71.3% over a 24-hour run
- Autonomous model training: Scored 0.37 on PostTrainBench, training another model end-to-end
Token Plan pricing
MiniMax offers monthly token plans for M3: $20/month (Plus, ~1.7B tokens), $50/month (Max, ~5.1B tokens), and $120/month (Ultra, ~9.8B tokens). Sign up through go.bitdoze.com/minimax for 10% off.
Setting M3 in Hermes
hermes config set model minimax/minimax-m3
M3 is available on OpenRouter as minimax/minimax-m3, so if you already have Hermes configured with an OpenRouter key, you can select it from the model list.
6. GLM 5.1 — The strongest overall
GLM 5.1 from Z.AI is the strongest model on this list. On SWE-Bench Pro, it scores 58.4% — ahead of GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. On LMArena Code, it ranks number one among open source models and number three globally. It can work autonomously on a single task for up to 8 hours, maintaining goal alignment without drifting.
GLM 5.1 Documentation| Spec | Value |
|---|---|
| Context Window | 200K tokens |
| Max Output | 128K tokens |
| SWE-Bench Pro | 58.4% |
| Positioning | Aligned with Claude Opus 4.6 |
| Input Cost | $1.05/M tokens |
| Output Cost | $3.50/M tokens |
What makes GLM 5.1 different
GLM 5.1 is the only model on this list that claims 8-hour sustained execution capability. In internal testing, it carried out 655 iterations on a vector database optimization task, boosting query throughput to 6.9x the initial production version. On KernelBench Level 3, it achieved a 3.6x geometric mean speedup through thousands of tool-invocation-driven optimizations.
For Hermes Agent, that means GLM 5.1 handles long-running scheduled tasks — morning briefings, server monitoring, complex research jobs — without losing the thread mid-execution.
Z.AI offers GLM Coding Plans starting at $18/month for the Lite tier. The Pro tier at $72/month includes MCP tools and faster generation speeds.
GLM Coding Plans (10% Off)GLM discount
Sign up through go.bitdoze.com/glm and get 10% off GLM Coding Plans.
Setting GLM 5.1 in Hermes
hermes config set model z-ai/glm-5.1
Add your Z.AI API key:
echo "ZAI_API_KEY=your-key-here" >> ~/.hermes/.env
GLM 5.1 is available on OpenRouter as well.
OpenCode Go — All five models, one subscription
If you do not want to manage separate API keys and billing for each provider, OpenCode Go bundles all six models into a single subscription for $10/month. For a detailed look at limits and real-world usage, see the OpenCode Go guide.
OpenCode GoWhat OpenCode Go includes
- $5 for your first month, then $10/month
- Access to MiniMax M3, MiniMax M2.7, MiMo V2.5 Pro, GLM 5.1, Kimi K2.6, DeepSeek V4 Pro, and more
- Models hosted in the US, EU, and Singapore for stable global access
- Zero-retention policy — providers do not use your data for training
Usage limits
OpenCode Go caps usage at $12 per 5 hours, $30 per week, and $60 per month. Cheaper models like MiniMax M2.7 let you make more requests within those limits. The estimated request counts:
| Model | Requests per 5 hours | Requests per week | Requests per month |
|---|---|---|---|
| MiniMax M3 | 3,400 | 8,500 | 17,000 |
| MiniMax M2.7 | 3,400 | 8,500 | 17,000 |
| DeepSeek V4 Pro | 3,450 | 8,550 | 17,150 |
| Kimi K2.6 | 1,150 | 2,880 | 5,750 |
| MiMo V2.5 Pro | 1,290 | 3,225 | 6,450 |
| GLM 5.1 | 880 | 2,150 | 4,300 |
At $10/month, OpenCode Go costs less than most individual provider plans and gives you the flexibility to switch between models depending on the task. For Hermes Agent, you can set the OpenCode Go endpoint as a custom provider and pick whichever model fits the job.
Setting up OpenCode Go in Hermes
Add the OpenCode Go endpoint to ~/.hermes/.env:
echo "OPENAI_BASE_URL=https://opencode.ai/zen/go/v1/chat/completions" >> ~/.hermes/.env
echo "OPENAI_API_KEY=your-opencode-go-key" >> ~/.hermes/.env
Then set the model:
hermes config set model opencode-go/minimax-m3
Switch models anytime:
hermes model
Head-to-head comparison
| Feature | MiniMax M2.7 | DeepSeek V4 Pro | Kimi K2.6 | MiMo V2.5 Pro | MiniMax M3 | GLM 5.1 |
|---|---|---|---|---|---|---|
| Input $/M | $0.25 | $0.435 | $0.67 | $0.43 | $0.30 | $0.98 |
| Output $/M | $1.00 | $0.87 | $3.39 | $0.87 | $1.20 | $3.08 |
| Context | 204K | 1M | 1T MoE | 1M | 1M | 202K |
| SWE-Bench Pro | 56.2% | — | — | — | 59.0% | 58.4% |
| Terminal-Bench 2.1 | — | — | — | — | 66.0% | — |
| Multimodal | No | No | No | No | Yes (img+video) | No |
| License | Open weights | MIT | Open weights | Open source | Open weights | Open source |
| Monthly est. | $7-15 | $10-20 | $15-30 | $15-35 | $7-15 | $15-40 |
Which one should you pick?
On a tight budget: MiniMax M2.7 or MiniMax M3. Both cost $0.30/M input. M3 adds 1M context, native multimodality, and higher SWE-Bench Pro (59.0% vs 56.2%). M2.7 is the proven workhorse, M3 is the upgrade.
Need long context: MiniMax M3 or DeepSeek V4 Pro. M3 gives you 1M context at $0.30/M with frontier coding benchmarks. DeepSeek V4 Pro at $0.435/M has the lowest hallucination rate on the list (6.0%).
Want the strongest agent: MiMo V2.5 Pro or GLM 5.1. Both match Claude Opus 4.6 on agent benchmarks. MiMo V2.5 Pro is slightly better at sustained long-horizon tasks with its token efficiency. GLM 5.1 has the edge on pure coding with its 58.4% SWE-Bench Pro score.
Do not want to choose: OpenCode Go at $10/month gives you all the models. Switch between them based on the task.
Subscription risk reminder
Using your Claude Code, Gemini CLI, or Codex subscription OAuth tokens with Hermes Agent can get your account banned. These providers monitor for automated usage patterns. Use API keys from the providers listed above instead. See our OpenClaw models guide for the full breakdown on why API access is the safe route.
What I actually run
My Hermes setup uses MiniMax M3 as the default model for everyday chat, quick tasks, and long-context work. For complex coding jobs and research tasks, I switch to GLM 5.1 or MiMo V2.5 Pro. DeepSeek V4 Pro handles anything that needs the lowest hallucination rate on live servers.
The fallback configuration looks like this:
hermes config set model minimax/minimax-m3
When I need more power for a specific task:
hermes model
# Select GLM 5.1 or MiMo V2.5 Pro
For most Hermes users, starting with MiniMax M3 and switching up when needed keeps costs low without sacrificing capability. M3’s 1M context and multimodal support make it a significant upgrade over M2.7 at the same price.
FAQ
Which model is cheapest for Hermes Agent?
MiniMax M2.7 and MiniMax M3 both cost $0.30/M input and $1.20/M output. Running Hermes 24/7 with moderate usage costs $7-15/month. M3 adds 1M context and native multimodality at the same price. DeepSeek V4 Pro is second cheapest at $0.435/M input and $0.87/M output.
Which model is strongest for coding?
MiniMax M3 scores 59.0% on SWE-Bench Pro, ahead of GPT-5.5 and Gemini 3.1 Pro. GLM 5.1 scores 58.4%. MiMo V2.5 Pro is strong on agentic tasks with AA Agentic Index 67.4.
Can I use OpenCode Go with Hermes Agent?
Yes. OpenCode Go provides an OpenAI-compatible API endpoint. Set the base URL to https://opencode.ai/zen/go/v1/chat/completions in your Hermes config and use your OpenCode Go API key. At $10/month, it bundles all six models listed here. For a detailed look at limits and benchmarks, see the OpenCode Go guide.
Do these models work through OpenRouter?
Yes. MiniMax M3, MiniMax M2.7, MiMo V2.5 Pro, GLM 5.1, Kimi K2.6, and DeepSeek V4 Pro are all available on OpenRouter. If you already have Hermes configured with an OpenRouter key, you can switch between them without adding new providers.
Is it safe to use my Claude subscription with Hermes?
No. Anthropic monitors for automated usage through OAuth tokens and has suspended accounts for it. Use API keys from the providers listed above. The OpenClaw models guide explains the risks in detail.
Which model has the lowest hallucination rate?
DeepSeek V4 Pro at 6.0% on the AA-Omniscience benchmark. GLM 5.1 reports near-zero hallucinations. For running commands on a live server through Hermes, lower hallucination means fewer mistakes.
For the full Hermes setup chain: start with the installer, set up a dashboard for browser access, configure the built-in web UI if you want SSH-tunneled access, and set up Kanban task boards for structured multi-agent workflows. If you want to try free models first, the Nous Portal guide covers the free promotions that rotate through Hermes partnerships. If you want a terminal coding agent to pair with Hermes, the OpenCode setup guide covers the open-source Claude Code alternative. And with GitHub Copilot moving to usage-based billing, the alternatives listed there apply to any AI coding workflow. For Qwen 3.6 as a model option, the Qwen 3.6 guide covers setup and benchmarks. If you prefer a minimal coding agent with a TypeScript extension system, our Pi coding agent setup guide covers installation, model configuration, and the best extensions including LazyPi.