Cheapest AI Models for Hermes Agent in 2026 (Under $1/M Tokens)

Hermes Agent runs 24/7. It answers messages, executes scheduled jobs, runs skills, and searches the web around the clock. That kind of usage adds up fast if you pick the wrong model. I have been testing different providers on my Hermes instance for months, and the open source landscape has changed a lot since my earlier model recommendations for OpenClaw.

I narrowed it down to eight models that work with Hermes Agent, cost a fraction of what Claude or GPT API access runs you, and in some cases match or beat those proprietary models on coding and agent benchmarks.

What this covers

Eight affordable open source models that work well with Hermes Agent
Per-token pricing, context windows, and coding benchmarks for each
Which model is cheapest, which is strongest, and which sits in the middle
OpenCode Go as a single subscription that bundles all of these models
How to set each model in Hermes Agent

If you have not installed Hermes Agent yet, the setup guide walks through the full process. For dashboard options to manage your agent from a browser, see the best Hermes dashboards roundup. And if you want the built-in web UI, the Hermes dashboard guide covers SSH tunnels, Caddy, and Docker deployment.

The models at a glance

Model	Input $/M tokens	Output $/M tokens	Context	Best For
DeepSeek V4 Flash	$0.098	$0.28	1M	Ultra-cheap, fast tasks
MiMo V2.5	$0.14	$0.28	1M	Budget omnimodal
MiniMax M2.7	$0.25	$1.00	204K	Cheapest quality, daily use
DeepSeek V4 Pro	$0.435	$0.87	1M	Long context on a budget
Kimi K2.6	$0.67	$3.39	1T MoE	Coding + agent swarm
MiMo V2.5 Pro	$0.43	$0.87	1M	Strongest agent, long tasks
GLM 5.2	$1.30	$4.05	1M	Best overall coding
MiniMax M3	$0.30	$1.20	1M	Frontier coding + multimodal

Bottom line

Cheapest: DeepSeek V4 Flash at $0.098/M input — under $5/month for 24/7 use. Best value for quality: MiniMax M3 at $0.30/M input with 1M context and 59% SWE-Bench Pro. Most powerful: GLM 5.2 and MiMo V2.5 Pro.

1. MiniMax M2.7 — The budget pick

This is the model I keep coming back to for everyday Hermes use. At $0.30 per million input tokens and $1.20 per million output tokens, running Hermes 24/7 costs roughly $7 to $15 per month depending on how much you use it. That is less than a coffee subscription.

MiniMax M2.7 (10% Off)

What M2.7 delivers

M2.7 is no slouch for the price. It scores 56.2% on SWE-Bench Pro, which puts it in the same range as models that cost three to five times as much. On the GDPval-AA benchmark for economically valuable tasks, it hits ELO 1495, the highest score among open source models. Debugging, root cause analysis, document generation, multi-step tool calls — it handles all of those without falling apart.

MiniMax also offers M2.7-highspeed, which runs the same model at higher throughput for a slightly higher price. For interactive Hermes sessions where response time matters, it is worth trying.

Spec	Value
Architecture	Mixture-of-Experts (MoE)
Context Window	196K tokens
SWE-Bench Pro	56.2%
GDPval-AA ELO	1,495
Input Cost	$0.30/M tokens
Output Cost	$1.20/M tokens
Cache Read	$0.059/M tokens

Token Plan pricing

MiniMax offers a Token Plan with discounted rates. If you sign up through this link, you get 10% off the Token Plan.

Coding plan tip

The MiniMax Token Plan gives you a flat pool of tokens at a discount. For Hermes Agent, the base M2.7 plan covers most use cases. Subscribe through go.bitdoze.com/minimax for 10% off.

Setting M2.7 in Hermes

hermes config set model minimax/minimax-m2.7

Or set it through the model picker:

hermes model

Select MiniMax and authenticate with your API key.

2. DeepSeek V4 Pro — Long context, low price

DeepSeek V4 Pro gives you a 1 million token context window for $0.435 per million input tokens. Both the longest context and the second cheapest price on this list. If your Hermes conversations get long or you feed it large codebases, this is the model that handles it without losing track.

It runs 1.6 trillion total parameters with 49 billion activated per token and supports both thinking and non-thinking modes.

Spec	Value
Architecture	MoE (1.6T total, 49B active)
Context Window	1M tokens
AA Intelligence Index	51.5 (better than 96% of models)
AA Agentic Index	67.2 (better than 98% of models)
Input Cost	$0.435/M tokens
Output Cost	$0.87/M tokens
Cache Read	$0.003625/M tokens

Where DeepSeek V4 Pro stands out

The hallucination rate on this model is 6.0% on the AA-Omniscience benchmark, the lowest on this list by far. When Hermes runs commands on a live server, that difference matters. It also scores 96.2% on tau2-Bench Telecom for conversational agent reliability.

Output cost is $0.87/M tokens, also the cheapest on this list. If your Hermes usage involves a lot of output — research summaries, code generation, document writing — DeepSeek V4 Pro keeps the bill down.

DeepSeek V4 Pro Announcement

Setting DeepSeek V4 Pro in Hermes

hermes config set model deepseek/deepseek-v4-pro

Add your DeepSeek API key to ~/.hermes/.env:

echo "DEEPSEEK_API_KEY=your-key-here" >> ~/.hermes/.env

3. Kimi K2.6 — Agent swarm built in

Kimi K2.6 from Moonshot AI does something the other models on this list don’t: an agent swarm that spins up hundreds of parallel sub-agents to break down and tackle complex tasks on its own. You don’t have to decompose the work yourself — K2.6 figures it out.

Spec	Value
Architecture	MoE (1T total, 32B active)
Context Window	262K tokens
AA Intelligence Index	53.9 (better than 98% of models)
AA Coding Index	47.1 (better than 95% of models)
AA Agentic Index	66.0 (better than 96% of models)
Input Cost	$0.75/M tokens
Output Cost	$3.50/M tokens

Why K2.6 works for Hermes

K2.6 scores 91.1% on GPQA Diamond for graduate-level scientific reasoning — the highest on this list. It also handles Python, Rust, and Go coding across long-horizon tasks. The Agent Swarm feature means that when Hermes hits a complex task, K2.6 can internally decompose it and work on pieces in parallel.

Moonshot AI offers Kimi Code as a subscription service. Plans start at $15/month for the Moderato tier. If you use Hermes primarily for coding tasks, the Kimi Code subscription gives you a managed experience with K2.6 baked in.

Kimi K2.6 Model Page

Setting K2.6 in Hermes

hermes config set model moonshotai/kimi-k2.6

Add your Moonshot API key:

echo "MOONSHOT_API_KEY=your-key-here" >> ~/.hermes/.env

4. Xiaomi MiMo V2.5 Pro — The agent powerhouse

MiMo V2.5 Pro is Xiaomi’s flagship model and one of the two strongest options on this list. It was built from the ground up for agent scenarios — complex software engineering, long-horizon tasks, and workflows that involve hundreds of tool calls in a single session.

During internal testing, MiMo V2.5 Pro completed a full SysY compiler in Rust in 4.3 hours with 672 tool calls, scoring a perfect 233/233 on the hidden test set. A task that takes undergraduate students at Peking University several weeks. It also built a working video editor web application — 8,192 lines of code across 1,868 tool invocations — in 11.5 hours of autonomous work.

MiMo V2.5 Pro Docs

Spec	Value
Context Window	1M tokens
AA Intelligence Index	53.8 (better than 98% of models)
AA Coding Index	45.5 (better than 94% of models)
AA Agentic Index	67.4 (better than 98% of models)
Input Cost (up to 256K)	$1.00/M tokens
Output Cost (up to 256K)	$3.00/M tokens
Input Cost (over 256K)	$2.00/M tokens
Output Cost (over 256K)	$6.00/M tokens
Cache Read	$0.20/M tokens

Token efficiency advantage

MiMo V2.5 Pro is optimized for token efficiency. On the ClawEval agent benchmark, it achieves the same score as Kimi K2.6 while using 42% fewer tokens. That means the higher per-token price gets offset by needing fewer tokens to complete the same task.

The MiMo Token Plan starts at $72/year for the Lite tier (720 million credits). The Pro tier at $600/year gives 8.4 billion credits. Off-peak hours (16:00-24:00 UTC) get a 20% discount on top of the plan rate.

MiMo Token Plan ($2 Bonus)

MiMo bonus

Setting MiMo V2.5 Pro in Hermes

hermes config set model xiaomi/mimo-v2.5-pro

Add your MiMo API key:

echo "MIMO_API_KEY=your-key-here" >> ~/.hermes/.env

MiMo V2.5 Pro is also available on OpenRouter, so if you already have Hermes configured with an OpenRouter key, you can select it from the model list without adding a new provider.

5. MiniMax M3 — Frontier coding with 1M context

MiniMax M3 is the latest flagship from MiniMax, released June 1, 2026. It is the first open-weight model to combine frontier coding, a 1-million-token context window, and native multimodality (image and video input). Built on MiniMax Sparse Attention (MSA), it cuts per-token compute at 1M context to one-twentieth of the prior M2.7 generation while running 9x faster prefill and 15x faster decoding. At the same $0.30/M input price as M2.7, M3 delivers significantly more capability.

MiniMax M3 (10% Off)

Spec	Value
Architecture	MiniMax Sparse Attention (MSA)
Context Window	1M tokens
Max Output	512K tokens
SWE-Bench Pro	59.0%
Terminal-Bench 2.1	66.0%
BrowseComp	83.5
Multimodal	Native (image + video input)
Input Cost	$0.30/M tokens
Output Cost	$1.20/M tokens
Cache Read	$0.06/M tokens

What M3 brings over M2.7

M3 keeps the same aggressive pricing as M2.7 but adds three things M2.7 never had:

1M context that actually works: MSA makes long-context affordable at 1/20 the compute cost of full attention. For Hermes conversations that span hours or involve large codebases, this matters.
59.0% SWE-Bench Pro: Beats GPT-5.5 and Gemini 3.1 Pro, approaches Claude Opus. M2.7 scored 56.2%.
Native multimodality: Built-in image and video understanding, so Hermes can read screenshots, mockups, and documents without a separate vision model.
83.5 BrowseComp: Surpasses Opus 4.7’s 79.3 on web search and browsing tasks.
66.0% Terminal-Bench 2.1: Strong command-line agent performance for server tasks.

Long-horizon demonstrations

MiniMax backed M3’s launch with three autonomous task demonstrations:

Paper reproduction: Autonomously reproduced an ICLR 2025 paper in 12 hours (18 commits, 23 figures)
CUDA kernel optimization: Pushed FP8 hardware utilization from 7.6% to 71.3% over a 24-hour run
Autonomous model training: Scored 0.37 on PostTrainBench, training another model end-to-end

Token Plan pricing

MiniMax offers monthly token plans for M3: $20/month (Plus, ~1.7B tokens), $50/month (Max, ~5.1B tokens), and $120/month (Ultra, ~9.8B tokens). Sign up through go.bitdoze.com/minimax for 10% off.

Setting M3 in Hermes

hermes config set model minimax/minimax-m3

M3 is available on OpenRouter as minimax/minimax-m3, so if you already have Hermes configured with an OpenRouter key, you can select it from the model list.

6. GLM 5.2 — The strongest overall

GLM 5.2 from Z.AI is the newest and strongest model on this list, released June 16, 2026. On SWE-Bench Pro, it scores 62.1% — ahead of GPT-5.5 and Gemini 3.1 Pro. On Terminal-Bench 2.1, it hits 81.0%, within 4 points of Claude Opus 4.8. It introduces a 1M token context window (up from 200K) and effort level control for balancing capability against cost.

GLM 5.2 Announcement

Spec	Value
Parameters	753B
Context Window	1M tokens
Max Output	128K tokens
SWE-Bench Pro	62.1%
Terminal-Bench 2.1	81.0%
Input Cost	$1.40/M tokens
Output Cost	$4.40/M tokens

What makes GLM 5.2 different

GLM 5.2 is the strongest open source model available. It scores 62.1% on SWE-Bench Pro and 81.0% on Terminal-Bench 2.1, making it the closest open source model to Claude Opus 4.8. The new 1M context window uses IndexShare to reduce compute cost, and effort level control lets you choose between High (faster, cheaper) and Max (best results) modes.

For Hermes Agent, that means GLM 5.2 handles long-running scheduled tasks — morning briefings, server monitoring, complex research jobs — without losing the thread mid-execution. The 1M context also means it can process entire codebases in a single session.

Z.AI offers GLM Coding Plans starting at $18/month for the Lite tier. GLM-5.2 consumes quota at 3× during peak hours and 2× during off-peak hours. Through September 2026, off-peak usage is billed at 1×.

GLM Coding Plans (10% Off)

GLM discount

Setting GLM 5.2 in Hermes

hermes config set model z-ai/glm-5.2

Add your Z.AI API key:

echo "ZAI_API_KEY=your-key-here" >> ~/.hermes/.env

GLM 5.2 is available on OpenRouter as well.

OpenCode Go — All five models, one subscription

If you do not want to manage separate API keys and billing for each provider, the OpenCode Go $10/month plan bundles these models (plus Grok 4.5, Kimi K3, and others) into a single subscription. For a detailed look at limits and real-world usage, see the full OpenCode Go review.

OpenCode Go

What OpenCode Go includes

$5 for your first month, then $10/month
Access to MiniMax M3, MiniMax M2.7, MiMo V2.5 Pro, GLM 5.2, Kimi K2.6, DeepSeek V4 Pro, and more
Models hosted in the US, EU, and Singapore for stable global access
Zero-retention policy — providers do not use your data for training

Usage limits

OpenCode Go caps usage at $12 per 5 hours, $30 per week, and $60 per month. Cheaper models like MiniMax M2.7 let you make more requests within those limits. The estimated request counts:

Model	Requests per 5 hours	Requests per week	Requests per month
MiniMax M3	3,400	8,500	17,000
MiniMax M2.7	3,400	8,500	17,000
DeepSeek V4 Pro	3,450	8,550	17,150
Kimi K2.6	1,150	2,880	5,750
MiMo V2.5 Pro	1,290	3,225	6,450
GLM 5.2	620	1,550	3,100

At $10/month, OpenCode Go costs less than most individual provider plans and gives you the flexibility to switch between models depending on the task. For Hermes Agent, you can set the OpenCode Go endpoint as a custom provider and pick whichever model fits the job.

Setting up OpenCode Go in Hermes

Add the OpenCode Go endpoint to ~/.hermes/.env:

echo "OPENAI_BASE_URL=https://opencode.ai/zen/go/v1/chat/completions" >> ~/.hermes/.env
echo "OPENAI_API_KEY=your-opencode-go-key" >> ~/.hermes/.env

Then set the model:

hermes config set model opencode-go/minimax-m3

Switch models anytime:

hermes model

Head-to-head comparison

Feature	MiniMax M2.7	DeepSeek V4 Pro	Kimi K2.6	MiMo V2.5 Pro	MiniMax M3	GLM 5.2
Input $/M	$0.25	$0.435	$0.67	$0.43	$0.30	$1.30
Output $/M	$1.00	$0.87	$3.39	$0.87	$1.20	$4.05
Context	204K	1M	1T MoE	1M	1M	1M
SWE-Bench Pro	56.2%	—	—	—	59.0%	62.1%
Terminal-Bench 2.1	—	—	—	—	66.0%	81.0%
Multimodal	No	No	No	No	Yes (img+video)	No
License	Open weights	MIT	Open weights	Open source	Open weights	Open source
Monthly est.	$7-15	$10-20	$15-30	$15-35	$7-15	$20-50

Which one should you pick?

On a tight budget: MiniMax M2.7 or MiniMax M3. Both cost $0.30/M input. M3 adds 1M context, native multimodality, and higher SWE-Bench Pro (59.0% vs 56.2%). M2.7 is the proven workhorse, M3 is the upgrade.

Need long context: MiniMax M3 or DeepSeek V4 Pro. M3 gives you 1M context at $0.30/M with frontier coding benchmarks. DeepSeek V4 Pro at $0.435/M has the lowest hallucination rate on the list (6.0%).

Want the strongest agent: MiMo V2.5 Pro or GLM 5.2. Both are top performers on agent benchmarks. MiMo V2.5 Pro is slightly better at sustained long-horizon tasks with its token efficiency. GLM 5.2 has the edge on pure coding with its 62.1% SWE-Bench Pro score and 81.0% Terminal-Bench 2.1.

Do not want to choose: OpenCode Go at $10/month gives you all the models. Switch between them based on the task.

Subscription risk reminder

Using your Claude Code, Gemini CLI, or Codex subscription OAuth tokens with Hermes Agent can get your account banned. These providers monitor for automated usage patterns. Use API keys from the providers listed above instead. See our OpenClaw models guide for the full breakdown on why API access is the safe route.

What I actually run

My Hermes setup uses MiniMax M3 as the default model for everyday chat, quick tasks, and long-context work. For complex coding jobs and research tasks, I switch to GLM 5.2 or MiMo V2.5 Pro. DeepSeek V4 Pro handles anything that needs the lowest hallucination rate on live servers.

The fallback configuration looks like this:

hermes config set model minimax/minimax-m3

When I need more power for a specific task:

hermes model
# Select GLM 5.2 or MiMo V2.5 Pro

For most Hermes users, starting with MiniMax M3 and switching up when needed keeps costs low without sacrificing capability. M3’s 1M context and multimodal support make it a significant upgrade over M2.7 at the same price.

FAQ

Which model is cheapest for Hermes Agent?

MiniMax M2.7 and MiniMax M3 both cost $0.30/M input and $1.20/M output. Running Hermes 24/7 with moderate usage costs $7-15/month. M3 adds 1M context and native multimodality at the same price. DeepSeek V4 Pro is second cheapest at $0.435/M input and $0.87/M output.

Which model is strongest for coding?

GLM 5.2 scores 62.1% on SWE-Bench Pro, the highest among open source models. MiniMax M3 scores 59.0%. MiMo V2.5 Pro is strong on agentic tasks with AA Agentic Index 67.4.

Can I use OpenCode Go with Hermes Agent?

Yes. OpenCode Go provides an OpenAI-compatible API endpoint. Set the base URL to https://opencode.ai/zen/go/v1/chat/completions in your Hermes config and use your OpenCode Go API key. At $10/month, it bundles these models plus Grok 4.5, Kimi K3, and more. For a detailed look at limits and benchmarks, see the OpenCode Go review.

Do these models work through OpenRouter?

Yes. MiniMax M3, MiniMax M2.7, MiMo V2.5 Pro, GLM 5.2, Kimi K2.6, and DeepSeek V4 Pro are all available on OpenRouter. If you already have Hermes configured with an OpenRouter key, you can switch between them without adding new providers.

Is it safe to use my Claude subscription with Hermes?

No. Anthropic monitors for automated usage through OAuth tokens and has suspended accounts for it. Use API keys from the providers listed above. The OpenClaw models guide explains the risks in detail.

Which model has the lowest hallucination rate?

DeepSeek V4 Pro at 6.0% on the AA-Omniscience benchmark. GLM 5.2 reports near-zero hallucinations. For running commands on a live server through Hermes, lower hallucination means fewer mistakes.

For the full Hermes setup chain: start with the installer, set up a dashboard for browser access, configure the built-in web UI if you want SSH-tunneled access, and set up Kanban task boards for structured multi-agent workflows. If you want to try free models first, the Nous Portal guide covers the free promotions that rotate through Hermes partnerships. If you want a terminal coding agent to pair with Hermes, the OpenCode setup guide covers the open-source Claude Code alternative. And with GitHub Copilot moving to usage-based billing, the alternatives listed there apply to any AI coding workflow. For Qwen 3.6 as a model option, the Qwen 3.6 guide covers setup and benchmarks. If you prefer a minimal coding agent with a TypeScript extension system, our Pi coding agent setup guide covers installation, model configuration, and the best extensions including LazyPi.

What this covers

The models at a glance

Bottom line

1. MiniMax M2.7 — The budget pick

What M2.7 delivers

Token Plan pricing

Coding plan tip

Setting M2.7 in Hermes

2. DeepSeek V4 Pro — Long context, low price

Where DeepSeek V4 Pro stands out

Setting DeepSeek V4 Pro in Hermes

3. Kimi K2.6 — Agent swarm built in

Why K2.6 works for Hermes

Setting K2.6 in Hermes

4. Xiaomi MiMo V2.5 Pro — The agent powerhouse

Token efficiency advantage

MiMo bonus

Setting MiMo V2.5 Pro in Hermes

5. MiniMax M3 — Frontier coding with 1M context

What M3 brings over M2.7

Long-horizon demonstrations

Token Plan pricing

Setting M3 in Hermes

6. GLM 5.2 — The strongest overall

What makes GLM 5.2 different

GLM discount

Setting GLM 5.2 in Hermes

OpenCode Go — All five models, one subscription

What OpenCode Go includes

Usage limits

Setting up OpenCode Go in Hermes

Head-to-head comparison

Which one should you pick?

Subscription risk reminder

What I actually run

FAQ

Related Posts

Best Hermes Agent Dashboards & Mobile Apps (2026 Guide)

TinyFish Review: Free Web Search and Fetch API for AI Coding Agents

Pi Coding Agent Setup Guide: Install, Configure Models, and Best Extensions

Table of Contents