---
title: "Cheapest AI Models for Hermes Agent in 2026 (Under $1/M Tokens)"
description: "8 affordable models for Hermes Agent — DeepSeek V4 Flash at $0.10/M tokens, MiMo V2.5, MiniMax M2.7, and more. Pricing benchmarks and which to pick for coding vs chat."
date: 2026-06-12
categories: ["AI"]
tags: ["ai-tools","hermes","llm"]
---

import Button from "@components/widgets/Button.astro";
import Notice from "@components/widgets/Notice.astro";
import ListCheck from "@components/widgets/ListCheck.astro";
import Accordion from "@components/widgets/Accordion.astro";
import Tabs from "@components/widgets/Tabs.astro";
import Tab from "@components/widgets/Tab.astro";

Hermes Agent runs 24/7. It answers messages, executes scheduled jobs, runs skills, and searches the web around the clock. That kind of usage adds up fast if you pick the wrong model. I have been testing different providers on my Hermes instance for months, and the open source landscape has changed a lot since my earlier [model recommendations for OpenClaw](/best-opensource-models-for-openclaw/).

I narrowed it down to eight models that work with Hermes Agent, cost a fraction of what Claude or GPT API access runs you, and in some cases match or beat those proprietary models on coding and agent benchmarks.

<Notice type="info" title="What this covers">
<ListCheck>
<ul>
<li>Eight affordable open source models that work well with Hermes Agent</li>
<li>Per-token pricing, context windows, and coding benchmarks for each</li>
<li>Which model is cheapest, which is strongest, and which sits in the middle</li>
<li>OpenCode Go as a single subscription that bundles all of these models</li>
<li>How to set each model in Hermes Agent</li>
</ul>
</ListCheck>
</Notice>

If you have not installed Hermes Agent yet, the [setup guide](/hermes-agent-setup-guide/) walks through the full process. For dashboard options to manage your agent from a browser, see the [best Hermes dashboards](/best-hermes-dashboards/) roundup. And if you want the built-in web UI, the [Hermes dashboard guide](/hermes-dashboard-guide/) covers SSH tunnels, Caddy, and Docker deployment.

## The models at a glance

| Model | Input $/M tokens | Output $/M tokens | Context | Best For |
|-------|------------------|--------------------|---------|----------|
| **DeepSeek V4 Flash** | $0.098 | $0.28 | 1M | Ultra-cheap, fast tasks |
| **MiMo V2.5** | $0.14 | $0.28 | 1M | Budget omnimodal |
| **MiniMax M2.7** | $0.25 | $1.00 | 204K | Cheapest quality, daily use |
| **DeepSeek V4 Pro** | $0.435 | $0.87 | 1M | Long context on a budget |
| **Kimi K2.6** | $0.67 | $3.39 | 1T MoE | Coding + agent swarm |
| **MiMo V2.5 Pro** | $0.43 | $0.87 | 1M | Strongest agent, long tasks |
| **GLM 5.1** | $0.98 | $3.08 | 202K | Best overall coding |
| **MiniMax M3** | $0.30 | $1.20 | 1M | Frontier coding + multimodal |

<Notice type="success" title="Bottom line">
**Cheapest:** DeepSeek V4 Flash at $0.098/M input — under $5/month for 24/7 use. **Best value for quality:** MiniMax M3 at $0.30/M input with 1M context and 59% SWE-Bench Pro. **Most powerful:** GLM 5.1 and MiMo V2.5 Pro.
</Notice>

## 1. MiniMax M2.7 — The budget pick

This is the model I keep coming back to for everyday Hermes use. At $0.30 per million input tokens and $1.20 per million output tokens, running Hermes 24/7 costs roughly $7 to $15 per month depending on how much you use it. That is less than a coffee subscription.

<Button text="MiniMax M2.7 (10% Off)" link="https://go.bitdoze.com/minimax" variant="solid" color="blue" size="md" icon="arrow-right" />

### What M2.7 delivers

M2.7 is no slouch for the price. It scores 56.2% on SWE-Bench Pro, which puts it in the same range as models that cost three to five times as much. On the GDPval-AA benchmark for economically valuable tasks, it hits ELO 1495, the highest score among open source models. Debugging, root cause analysis, document generation, multi-step tool calls — it handles all of those without falling apart.

MiniMax also offers M2.7-highspeed, which runs the same model at higher throughput for a slightly higher price. For interactive Hermes sessions where response time matters, it is worth trying.

| Spec | Value |
|------|-------|
| **Architecture** | Mixture-of-Experts (MoE) |
| **Context Window** | 196K tokens |
| **SWE-Bench Pro** | 56.2% |
| **GDPval-AA ELO** | 1,495 |
| **Input Cost** | $0.30/M tokens |
| **Output Cost** | $1.20/M tokens |
| **Cache Read** | $0.059/M tokens |

### Token Plan pricing

MiniMax offers a [Token Plan](https://platform.minimax.io/subscribe/token-plan) with discounted rates. If you sign up through [this link](https://go.bitdoze.com/minimax), you get 10% off the Token Plan.

<Notice type="info" title="Coding plan tip">
The MiniMax Token Plan gives you a flat pool of tokens at a discount. For Hermes Agent, the base M2.7 plan covers most use cases. Subscribe through [go.bitdoze.com/minimax](https://go.bitdoze.com/minimax) for 10% off.
</Notice>

### Setting M2.7 in Hermes

```bash
hermes config set model minimax/minimax-m2.7
```

Or set it through the model picker:

```bash
hermes model
```

Select MiniMax and authenticate with your API key.

## 2. DeepSeek V4 Pro — Long context, low price

DeepSeek V4 Pro gives you a 1 million token context window for $0.435 per million input tokens. Both the longest context and the second cheapest price on this list. If your Hermes conversations get long or you feed it large codebases, this is the model that handles it without losing track.

It runs 1.6 trillion total parameters with 49 billion activated per token and supports both thinking and non-thinking modes.

| Spec | Value |
|------|-------|
| **Architecture** | MoE (1.6T total, 49B active) |
| **Context Window** | 1M tokens |
| **AA Intelligence Index** | 51.5 (better than 96% of models) |
| **AA Agentic Index** | 67.2 (better than 98% of models) |
| **Input Cost** | $0.435/M tokens |
| **Output Cost** | $0.87/M tokens |
| **Cache Read** | $0.003625/M tokens |

### Where DeepSeek V4 Pro stands out

The hallucination rate on this model is 6.0% on the AA-Omniscience benchmark, the lowest on this list by far. When Hermes runs commands on a live server, that difference matters. It also scores 96.2% on tau2-Bench Telecom for conversational agent reliability.

Output cost is $0.87/M tokens, also the cheapest on this list. If your Hermes usage involves a lot of output — research summaries, code generation, document writing — DeepSeek V4 Pro keeps the bill down.

<Button text="DeepSeek V4 Pro Announcement" link="https://api-docs.deepseek.com/news/news260424" variant="outline" color="blue" size="md" icon="arrow-right" />

### Setting DeepSeek V4 Pro in Hermes

```bash
hermes config set model deepseek/deepseek-v4-pro
```

Add your DeepSeek API key to `~/.hermes/.env`:

```bash
echo "DEEPSEEK_API_KEY=your-key-here" >> ~/.hermes/.env
```

## 3. Kimi K2.6 — Agent swarm built in

Kimi K2.6 from Moonshot AI does something the other models on this list don't: an agent swarm that spins up hundreds of parallel sub-agents to break down and tackle complex tasks on its own. You don't have to decompose the work yourself — K2.6 figures it out.

| Spec | Value |
|------|-------|
| **Architecture** | MoE (1T total, 32B active) |
| **Context Window** | 262K tokens |
| **AA Intelligence Index** | 53.9 (better than 98% of models) |
| **AA Coding Index** | 47.1 (better than 95% of models) |
| **AA Agentic Index** | 66.0 (better than 96% of models) |
| **Input Cost** | $0.75/M tokens |
| **Output Cost** | $3.50/M tokens |

### Why K2.6 works for Hermes

K2.6 scores 91.1% on GPQA Diamond for graduate-level scientific reasoning — the highest on this list. It also handles Python, Rust, and Go coding across long-horizon tasks. The Agent Swarm feature means that when Hermes hits a complex task, K2.6 can internally decompose it and work on pieces in parallel.

Moonshot AI offers [Kimi Code](https://www.kimi.com/code) as a subscription service. Plans start at $15/month for the Moderato tier. If you use Hermes primarily for coding tasks, the Kimi Code subscription gives you a managed experience with K2.6 baked in.

<Button text="Kimi K2.6 Model Page" link="https://www.kimi.com/ai-models/kimi-k2-6" variant="solid" color="purple" size="md" icon="arrow-right" />

### Setting K2.6 in Hermes

```bash
hermes config set model moonshotai/kimi-k2.6
```

Add your Moonshot API key:

```bash
echo "MOONSHOT_API_KEY=your-key-here" >> ~/.hermes/.env
```

## 4. Xiaomi MiMo V2.5 Pro — The agent powerhouse

MiMo V2.5 Pro is Xiaomi's flagship model and one of the two strongest options on this list. It was built from the ground up for agent scenarios — complex software engineering, long-horizon tasks, and workflows that involve hundreds of tool calls in a single session.

During internal testing, MiMo V2.5 Pro completed a full SysY compiler in Rust in 4.3 hours with 672 tool calls, scoring a perfect 233/233 on the hidden test set. A task that takes undergraduate students at Peking University several weeks. It also built a working video editor web application — 8,192 lines of code across 1,868 tool invocations — in 11.5 hours of autonomous work.

<Button text="MiMo V2.5 Pro Docs" link="https://platform.xiaomimimo.com/docs/en-US/news/v2.5-news" variant="solid" color="blue" size="md" icon="arrow-right" />

| Spec | Value |
|------|-------|
| **Context Window** | 1M tokens |
| **AA Intelligence Index** | 53.8 (better than 98% of models) |
| **AA Coding Index** | 45.5 (better than 94% of models) |
| **AA Agentic Index** | 67.4 (better than 98% of models) |
| **Input Cost (up to 256K)** | $1.00/M tokens |
| **Output Cost (up to 256K)** | $3.00/M tokens |
| **Input Cost (over 256K)** | $2.00/M tokens |
| **Output Cost (over 256K)** | $6.00/M tokens |
| **Cache Read** | $0.20/M tokens |

### Token efficiency advantage

MiMo V2.5 Pro is optimized for token efficiency. On the ClawEval agent benchmark, it achieves the same score as Kimi K2.6 while using 42% fewer tokens. That means the higher per-token price gets offset by needing fewer tokens to complete the same task.

The [MiMo Token Plan](https://platform.xiaomimimo.com/token-plan) starts at $72/year for the Lite tier (720 million credits). The Pro tier at $600/year gives 8.4 billion credits. Off-peak hours (16:00-24:00 UTC) get a 20% discount on top of the plan rate.

<Button text="MiMo Token Plan ($2 Bonus)" link="https://go.bitdoze.com/mimo" variant="solid" color="green" size="md" icon="arrow-right" />

<Notice type="info" title="MiMo bonus">
Sign up through [go.bitdoze.com/mimo](https://go.bitdoze.com/mimo) and get a $2 bonus credit on the MiMo Token Plan.
</Notice>

### Setting MiMo V2.5 Pro in Hermes

```bash
hermes config set model xiaomi/mimo-v2.5-pro
```

Add your MiMo API key:

```bash
echo "MIMO_API_KEY=your-key-here" >> ~/.hermes/.env
```

MiMo V2.5 Pro is also available on OpenRouter, so if you already have Hermes configured with an OpenRouter key, you can select it from the model list without adding a new provider.

## 5. MiniMax M3 — Frontier coding with 1M context

MiniMax M3 is the latest flagship from MiniMax, released June 1, 2026. It is the first open-weight model to combine frontier coding, a 1-million-token context window, and native multimodality (image and video input). Built on MiniMax Sparse Attention (MSA), it cuts per-token compute at 1M context to one-twentieth of the prior M2.7 generation while running 9x faster prefill and 15x faster decoding. At the same $0.30/M input price as M2.7, M3 delivers significantly more capability.

<Button text="MiniMax M3 (10% Off)" link="https://go.bitdoze.com/minimax" variant="solid" color="purple" size="md" icon="arrow-right" />

| Spec | Value |
|------|-------|
| **Architecture** | MiniMax Sparse Attention (MSA) |
| **Context Window** | 1M tokens |
| **Max Output** | 512K tokens |
| **SWE-Bench Pro** | 59.0% |
| **Terminal-Bench 2.1** | 66.0% |
| **BrowseComp** | 83.5 |
| **Multimodal** | Native (image + video input) |
| **Input Cost** | $0.30/M tokens |
| **Output Cost** | $1.20/M tokens |
| **Cache Read** | $0.06/M tokens |

### What M3 brings over M2.7

M3 keeps the same aggressive pricing as M2.7 but adds three things M2.7 never had:

- **1M context that actually works**: MSA makes long-context affordable at 1/20 the compute cost of full attention. For Hermes conversations that span hours or involve large codebases, this matters.
- **59.0% SWE-Bench Pro**: Beats GPT-5.5 and Gemini 3.1 Pro, approaches Claude Opus. M2.7 scored 56.2%.
- **Native multimodality**: Built-in image and video understanding, so Hermes can read screenshots, mockups, and documents without a separate vision model.
- **83.5 BrowseComp**: Surpasses Opus 4.7's 79.3 on web search and browsing tasks.
- **66.0% Terminal-Bench 2.1**: Strong command-line agent performance for server tasks.

### Long-horizon demonstrations

MiniMax backed M3's launch with three autonomous task demonstrations:

- **Paper reproduction**: Autonomously reproduced an ICLR 2025 paper in 12 hours (18 commits, 23 figures)
- **CUDA kernel optimization**: Pushed FP8 hardware utilization from 7.6% to 71.3% over a 24-hour run
- **Autonomous model training**: Scored 0.37 on PostTrainBench, training another model end-to-end

### Token Plan pricing

MiniMax offers monthly token plans for M3: $20/month (Plus, ~1.7B tokens), $50/month (Max, ~5.1B tokens), and $120/month (Ultra, ~9.8B tokens). Sign up through [go.bitdoze.com/minimax](https://go.bitdoze.com/minimax) for 10% off.

### Setting M3 in Hermes

```bash
hermes config set model minimax/minimax-m3
```

M3 is available on OpenRouter as `minimax/minimax-m3`, so if you already have Hermes configured with an OpenRouter key, you can select it from the model list.

## 6. GLM 5.1 — The strongest overall

GLM 5.1 from Z.AI is the strongest model on this list. On SWE-Bench Pro, it scores 58.4% — ahead of GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. On LMArena Code, it ranks number one among open source models and number three globally. It can work autonomously on a single task for up to 8 hours, maintaining goal alignment without drifting.

<Button text="GLM 5.1 Documentation" link="https://docs.z.ai/guides/llm/glm-5.1" variant="solid" color="blue" size="md" icon="arrow-right" />

| Spec | Value |
|------|-------|
| **Context Window** | 200K tokens |
| **Max Output** | 128K tokens |
| **SWE-Bench Pro** | 58.4% |
| **Positioning** | Aligned with Claude Opus 4.6 |
| **Input Cost** | $1.05/M tokens |
| **Output Cost** | $3.50/M tokens |

### What makes GLM 5.1 different

GLM 5.1 is the only model on this list that claims 8-hour sustained execution capability. In internal testing, it carried out 655 iterations on a vector database optimization task, boosting query throughput to 6.9x the initial production version. On KernelBench Level 3, it achieved a 3.6x geometric mean speedup through thousands of tool-invocation-driven optimizations.

For Hermes Agent, that means GLM 5.1 handles long-running scheduled tasks — morning briefings, server monitoring, complex research jobs — without losing the thread mid-execution.

Z.AI offers [GLM Coding Plans](https://z.ai/subscribe) starting at $18/month for the Lite tier. The Pro tier at $72/month includes MCP tools and faster generation speeds.

<Button text="GLM Coding Plans (10% Off)" link="https://go.bitdoze.com/glm" variant="solid" color="green" size="md" icon="arrow-right" />

<Notice type="info" title="GLM discount">
Sign up through [go.bitdoze.com/glm](https://go.bitdoze.com/glm) and get 10% off GLM Coding Plans.
</Notice>

### Setting GLM 5.1 in Hermes

```bash
hermes config set model z-ai/glm-5.1
```

Add your Z.AI API key:

```bash
echo "ZAI_API_KEY=your-key-here" >> ~/.hermes/.env
```

GLM 5.1 is available on OpenRouter as well.

## OpenCode Go — All five models, one subscription

If you do not want to manage separate API keys and billing for each provider, [OpenCode Go](https://go.bitdoze.com/opencode-go) bundles all six models into a single subscription for $10/month. For a detailed look at limits and real-world usage, see the [OpenCode Go guide](/opencode-go-plan/).

<Button text="OpenCode Go" link="https://go.bitdoze.com/opencode-go" variant="solid" color="purple" size="md" icon="arrow-right" />

### What OpenCode Go includes

- **$5 for your first month**, then $10/month
- Access to MiniMax M3, MiniMax M2.7, MiMo V2.5 Pro, GLM 5.1, Kimi K2.6, DeepSeek V4 Pro, and more
- Models hosted in the US, EU, and Singapore for stable global access
- Zero-retention policy — providers do not use your data for training

### Usage limits

OpenCode Go caps usage at $12 per 5 hours, $30 per week, and $60 per month. Cheaper models like MiniMax M2.7 let you make more requests within those limits. The estimated request counts:

| Model | Requests per 5 hours | Requests per week | Requests per month |
|-------|---------------------|-------------------|--------------------|
| MiniMax M3 | 3,400 | 8,500 | 17,000 |
| MiniMax M2.7 | 3,400 | 8,500 | 17,000 |
| DeepSeek V4 Pro | 3,450 | 8,550 | 17,150 |
| Kimi K2.6 | 1,150 | 2,880 | 5,750 |
| MiMo V2.5 Pro | 1,290 | 3,225 | 6,450 |
| GLM 5.1 | 880 | 2,150 | 4,300 |

At $10/month, OpenCode Go costs less than most individual provider plans and gives you the flexibility to switch between models depending on the task. For Hermes Agent, you can set the OpenCode Go endpoint as a custom provider and pick whichever model fits the job.

### Setting up OpenCode Go in Hermes

Add the OpenCode Go endpoint to `~/.hermes/.env`:

```bash
echo "OPENAI_BASE_URL=https://opencode.ai/zen/go/v1/chat/completions" >> ~/.hermes/.env
echo "OPENAI_API_KEY=your-opencode-go-key" >> ~/.hermes/.env
```

Then set the model:

```bash
hermes config set model opencode-go/minimax-m3
```

Switch models anytime:

```bash
hermes model
```

## Head-to-head comparison

| Feature | MiniMax M2.7 | DeepSeek V4 Pro | Kimi K2.6 | MiMo V2.5 Pro | MiniMax M3 | GLM 5.1 |
|---------|-------------|-----------------|-----------|---------------|------------|---------|
| **Input $/M** | $0.25 | $0.435 | $0.67 | $0.43 | $0.30 | $0.98 |
| **Output $/M** | $1.00 | $0.87 | $3.39 | $0.87 | $1.20 | $3.08 |
| **Context** | 204K | 1M | 1T MoE | 1M | 1M | 202K |
| **SWE-Bench Pro** | 56.2% | — | — | — | 59.0% | 58.4% |
| **Terminal-Bench 2.1** | — | — | — | — | 66.0% | — |
| **Multimodal** | No | No | No | No | Yes (img+video) | No |
| **License** | Open weights | MIT | Open weights | Open source | Open weights | Open source |
| **Monthly est.** | $7-15 | $10-20 | $15-30 | $15-35 | $7-15 | $15-40 |

### Which one should you pick?

**On a tight budget:** MiniMax M2.7 or MiniMax M3. Both cost $0.30/M input. M3 adds 1M context, native multimodality, and higher SWE-Bench Pro (59.0% vs 56.2%). M2.7 is the proven workhorse, M3 is the upgrade.

**Need long context:** MiniMax M3 or DeepSeek V4 Pro. M3 gives you 1M context at $0.30/M with frontier coding benchmarks. DeepSeek V4 Pro at $0.435/M has the lowest hallucination rate on the list (6.0%).

**Want the strongest agent:** MiMo V2.5 Pro or GLM 5.1. Both match Claude Opus 4.6 on agent benchmarks. MiMo V2.5 Pro is slightly better at sustained long-horizon tasks with its token efficiency. GLM 5.1 has the edge on pure coding with its 58.4% SWE-Bench Pro score.

**Do not want to choose:** OpenCode Go at $10/month gives you all the models. Switch between them based on the task.

<Notice type="warning" title="Subscription risk reminder">
Using your Claude Code, Gemini CLI, or Codex subscription OAuth tokens with Hermes Agent can get your account banned. These providers monitor for automated usage patterns. Use API keys from the providers listed above instead. See our [OpenClaw models guide](/best-opensource-models-for-openclaw/) for the full breakdown on why API access is the safe route.
</Notice>

## What I actually run

My Hermes setup uses MiniMax M3 as the default model for everyday chat, quick tasks, and long-context work. For complex coding jobs and research tasks, I switch to GLM 5.1 or MiMo V2.5 Pro. DeepSeek V4 Pro handles anything that needs the lowest hallucination rate on live servers.

The fallback configuration looks like this:

```bash
hermes config set model minimax/minimax-m3
```

When I need more power for a specific task:

```bash
hermes model
# Select GLM 5.1 or MiMo V2.5 Pro
```

For most Hermes users, starting with MiniMax M3 and switching up when needed keeps costs low without sacrificing capability. M3's 1M context and multimodal support make it a significant upgrade over M2.7 at the same price.

## FAQ

<Accordion label="Which model is cheapest for Hermes Agent?" group="faq" expanded="true">
MiniMax M2.7 and MiniMax M3 both cost $0.30/M input and $1.20/M output. Running Hermes 24/7 with moderate usage costs $7-15/month. M3 adds 1M context and native multimodality at the same price. DeepSeek V4 Pro is second cheapest at $0.435/M input and $0.87/M output.
</Accordion>

<Accordion label="Which model is strongest for coding?" group="faq">
MiniMax M3 scores 59.0% on SWE-Bench Pro, ahead of GPT-5.5 and Gemini 3.1 Pro. GLM 5.1 scores 58.4%. MiMo V2.5 Pro is strong on agentic tasks with AA Agentic Index 67.4.
</Accordion>

<Accordion label="Can I use OpenCode Go with Hermes Agent?" group="faq">
Yes. OpenCode Go provides an OpenAI-compatible API endpoint. Set the base URL to `https://opencode.ai/zen/go/v1/chat/completions` in your Hermes config and use your OpenCode Go API key. At $10/month, it bundles all six models listed here. For a detailed look at limits and benchmarks, see the [OpenCode Go guide](/opencode-go-plan/).
</Accordion>

<Accordion label="Do these models work through OpenRouter?" group="faq">
Yes. MiniMax M3, MiniMax M2.7, MiMo V2.5 Pro, GLM 5.1, Kimi K2.6, and DeepSeek V4 Pro are all available on OpenRouter. If you already have Hermes configured with an OpenRouter key, you can switch between them without adding new providers.
</Accordion>

<Accordion label="Is it safe to use my Claude subscription with Hermes?" group="faq">
No. Anthropic monitors for automated usage through OAuth tokens and has suspended accounts for it. Use API keys from the providers listed above. The [OpenClaw models guide](/best-opensource-models-for-openclaw/) explains the risks in detail.
</Accordion>

<Accordion label="Which model has the lowest hallucination rate?" group="faq">
DeepSeek V4 Pro at 6.0% on the AA-Omniscience benchmark. GLM 5.1 reports near-zero hallucinations. For running commands on a live server through Hermes, lower hallucination means fewer mistakes.
</Accordion>

For the full Hermes setup chain: start with the [installer](/hermes-agent-setup-guide/), set up a [dashboard](/best-hermes-dashboards/) for browser access, configure the [built-in web UI](/hermes-dashboard-guide/) if you want SSH-tunneled access, and set up [Kanban task boards](/hermes-kanban-setup-guide/) for structured multi-agent workflows. If you want to try free models first, the [Nous Portal guide](/hermes-agent-mimo-v2-pro/) covers the free promotions that rotate through Hermes partnerships. If you want a terminal coding agent to pair with Hermes, the [OpenCode setup guide](/opencode-setup-guide/) covers the open-source Claude Code alternative. And with [GitHub Copilot moving to usage-based billing](/github-copilot-alternatives-2026/), the alternatives listed there apply to any AI coding workflow. For Qwen 3.6 as a model option, the [Qwen 3.6 guide](/qwen36-ai-coding-agents/) covers setup and benchmarks. If you prefer a minimal coding agent with a TypeScript extension system, our [Pi coding agent setup guide](/pi-coding-agent-setup-guide/) covers installation, model configuration, and the best extensions including LazyPi.