Qwen 3.6 Models for AI Coding Agents: Setup, Pricing, and Benchmarks
Alibaba's Qwen 3.6 series includes Plus, 27B, 35B-A3B, and Max Preview models. Pricing from $0.33/M input tokens, 1M context, and strong coding benchmarks. How to set them up with Hermes, OpenClaw, and OpenCode.
Alibaba shipped the Qwen 3.6 series in April 2026 and the developer community noticed fast. The r/LocalLLaMA thread announcing Qwen 3.6 hit 760 upvotes with comments like “the performance jump is real” and people reporting it handled tasks they normally only trust Opus and Codex with. I have been testing Qwen 3.6 Plus and Qwen 3.6-27B with Hermes Agent, OpenCode, and OpenClaw for the past two weeks.
What caught my attention: Qwen 3.6 Plus scores 78.8 on SWE-bench Verified, costs $0.33/M input tokens, and has a 1M token context window. For reference, that puts it in the same performance bracket as models that cost three to ten times as much. The open-weight models (27B and 35B-A3B) run on modest hardware and still pull strong numbers.
This guide covers the full Qwen 3.6 lineup, what each model is good at, pricing through different providers, and how to connect them to your coding agents.
What this covers
- Four Qwen 3.6 models: Plus, 27B, 35B-A3B, and Max Preview
- Pricing through Alibaba direct, OpenRouter, and OpenCode Go
- Benchmarks: SWE-bench, Terminal-Bench, GPQA, and agentic tests
- Setup with Hermes Agent, OpenClaw, OpenCode, and Ollama
- Which Qwen 3.6 model to pick for different tasks
If you are still deciding between AI coding agents, our OpenCode setup guide covers the open-source Claude Code alternative, and the GitHub Copilot alternatives article breaks down the options after the June 1 pricing change.
The Qwen 3.6 lineup
Alibaba released four models in the 3.6 series. Each targets a different use case.
| Model | Parameters | Context | Input $/M | Output $/M | License | Best For |
|---|---|---|---|---|---|---|
| Qwen 3.6 Plus | Proprietary MoE | 1M | $0.33 | $1.95 | Closed | Daily coding, agents |
| Qwen 3.6 27B | 27B dense | 262K | ~$0.15 | ~$0.60 | Apache 2.0 | Self-hosted coding |
| Qwen 3.6 35B-A3B | 35B total, 3B active | 262K | ~$0.08 | ~$0.30 | Apache 2.0 | Budget self-hosting |
| Qwen 3.6 Max Preview | ~1T MoE | 262K | Varies | Varies | Closed | Maximum performance |
Qwen 3.6 Plus — The one I use most
This is the workhorse. $0.33/M input tokens with a 1M context window. It builds on a hybrid architecture that combines linear attention with sparse MoE routing. Alibaba tuned it specifically for agentic coding and front-end development.
On SWE-bench Verified it scores 78.8. On the Design Arena benchmark for front-end work, it places in the top 11% for 3D scenes, top 14% for games, and top 16% for UI components. That “vibe coding” experience people talk about — generating usable React components and full-stack apps from a description — this model does it well.
The 1M context window matters for agent work. When Hermes or OpenCode is processing a large repo, the model needs to hold the full file structure, multiple related files, and the conversation history without dropping pieces. 1M tokens handles that.
Qwen 3.6 Plus on OpenRouterQwen 3.6 27B — Self-hosted coding
A dense 27B parameter model released under Apache 2.0. If you have a GPU with 24GB+ VRAM (or 64GB+ RAM for CPU inference), you can run this locally through Ollama and pay zero per-token costs.
It accepts text, image, and video input, has a 262K context window, and includes a built-in thinking mode for extended reasoning. The r/LocalLLaMA community reports it handles repository-level code comprehension, front-end workflows, and multi-step problem solving at a level comparable to much larger models.
Qwen 3.6 35B-A3B — The budget self-hosted option
This is a MoE model with 35B total parameters but only 3B active per token. That means it runs fast on much less hardware than the 27B dense model while delivering comparable performance for many tasks. Apache 2.0 license, 262K native context (extensible to 1M via YaRN).
If you want to self-host a coding model on a VPS without a GPU, this is the one to try. A 3B active model can run on CPU-only hardware at usable speeds.
Qwen 3.6 Max Preview — Maximum performance
Alibaba’s proprietary frontier model. It hit number one on six coding benchmarks on April 20, 2026: SWE-bench Pro, Terminal-Bench 2.0, and SkillsBench among them. About 1 trillion total parameters, 262K context.
This is closed-weights and available only through Alibaba Cloud and Qwen Studio APIs. It is the strongest Qwen model but costs more than the Plus variant. For most coding agent use cases, Plus is the better value.
Pricing comparison
Qwen 3.6 models are available through multiple providers. Prices vary.
Direct from Alibaba (Qwen API)
| Model | Input $/M | Output $/M |
|---|---|---|
| Qwen 3.6 Plus (up to 256K) | $0.50 | $3.00 |
| Qwen 3.6 Plus (over 256K) | $2.00 | $6.00 |
Through OpenRouter
OpenRouter adds a small markup but gives you automatic fallback across providers.
| Model | Input $/M | Output $/M | Cache Read |
|---|---|---|---|
| Qwen 3.6 Plus | $0.33 | $1.95 | $0.033 |
| Qwen 3.6 35B-A3B | ~$0.08 | ~$0.30 | Varies |
| Qwen 3.6 27B | ~$0.15 | ~$0.60 | Varies |
| Qwen 3.6 Max Preview | Varies | Varies | Varies |
The effective weighted average price on OpenRouter for Qwen 3.6 Plus is about $0.40/M input and $2.05/M output. The cache read price of $0.033/M is very low, which benefits agent workflows where the model repeatedly reads the same project files.
Through OpenCode Go
Qwen 3.6 Plus and Qwen 3.5 Plus are both included in OpenCode Go at $10/month. At that price, Qwen 3.6 Plus gives you an estimated 3,300 requests per 5 hours and 16,300 requests per month.
Benchmarks
Coding performance
| Benchmark | Qwen 3.6 Plus | Qwen 3.6 Max Preview |
|---|---|---|
| SWE-bench Verified | 78.8% | #1 (multiple benchmarks) |
| SWE-bench Pro | — | #1 |
| Terminal-Bench 2.0 | — | #1 |
| SkillsBench | — | #1 |
Design Arena (front-end)
| Category | Qwen 3.6 Plus Elo | Ranking |
|---|---|---|
| 3D | 1321 | Top 11% |
| Code Categories | 1292 | Top 14% |
| Game Development | 1293 | Top 14% |
| UI Component | 1301 | Top 16% |
| Website | 1274 | Top 19% |
| SVG | 1249 | Top 16% |
| Data Visualization | 1270 | Top 18% |
Who uses Qwen 3.6?
On OpenRouter, the top apps using Qwen 3.6 Plus this month are Hermes Agent (153B tokens), OpenClaw (147B tokens), Claude Code (56.3B tokens), Roo Code (18.3B tokens), and Cline (17.6B tokens). That tells you the agent ecosystem is already adopting these models at scale.
Setting up Qwen 3.6 with your agents
Hermes Agent
# Via OpenRouter (recommended)
hermes config set model qwen/qwen3.6-plus
# Or set OpenRouter key if not already configured
echo "OPENROUTER_API_KEY=your-key" >> ~/.hermes/.env
OpenCode
/connect
# Select OpenRouter or OpenCode Go
Then /models to pick Qwen 3.6 Plus.
OpenClaw
Edit your config:
{
"agents": {
"defaults": {
"model": {
"primary": "qwen/qwen3.6-plus",
"fallback": ["minimax/minimax-m2.7"]
}
}
}
}
Restart the gateway:
openclaw gateway restart
Ollama (for self-hosted 27B or 35B-A3B)
# Pull the model
ollama pull qwen3.6:27b
# Or the MoE variant
ollama pull qwen3.6:35b-a3b
Then configure your agent to use the local Ollama endpoint:
# Hermes
echo "OPENAI_BASE_URL=http://localhost:11434/v1" >> ~/.hermes/.env
echo "OPENAI_API_KEY=ollama" >> ~/.hermes/.env
hermes config set model ollama/qwen3.6:27b
See our Ollama Docker guide for setting up Ollama on your server.
Which Qwen 3.6 model should you pick?
Everyday coding agent work: Qwen 3.6 Plus. The $0.33/M input price, 1M context, and strong SWE-bench score make it the default choice. It handles most coding tasks without needing to switch to a more expensive model.
Self-hosting with a GPU: Qwen 3.6 27B. Apache 2.0 license, 262K context, strong performance. Runs on a single 24GB GPU.
Self-hosting on a budget: Qwen 3.6 35B-A3B. Only 3B active parameters means it runs on modest hardware, including CPU-only VPS setups. Apache 2.0 license.
Maximum accuracy regardless of cost: Qwen 3.6 Max Preview. Number one on six coding benchmarks. Use it for the hard stuff and fall back to Plus for everything else.
Do not want to choose: OpenCode Go includes both Qwen 3.6 Plus and Qwen 3.5 Plus at $10/month. See the OpenCode Go guide for limits and benchmarks. Switch between them and 10 other models based on the task.
Qwen 3.6 vs the competition
| Feature | Qwen 3.6 Plus | MiniMax M2.7 | GLM 5.1 | DeepSeek V4 Pro |
|---|---|---|---|---|
| Input $/M | $0.33 | $0.30 | $1.05 | $0.435 |
| Output $/M | $1.95 | $1.20 | $3.50 | $0.87 |
| Context | 1M | 196K | 200K | 1M |
| SWE-bench Verified | 78.8% | — | — | — |
| Design/front-end | Strong | Average | Average | Average |
| Hallucination rate | Not published | 65.6% | Near-zero | 6.0% |
| License | Closed | Open weights | Open source | MIT |
Qwen 3.6 Plus sits between MiniMax M2.7 and GLM 5.1 in price. Its 1M context matches DeepSeek V4 Pro. Where it stands out is front-end and UI work — the Design Arena rankings are significantly stronger than any other model at this price point.
For backend and systems coding, GLM 5.1 still has the edge with its 58.4% SWE-bench Pro score. For the absolute cheapest option, MiniMax M2.7 at $0.30/M input is hard to beat.
A practical setup: use Qwen 3.6 Plus as your default model for everything. Switch to DeepSeek V4 Pro when you need the absolute lowest hallucination rate on server commands. Switch to GLM 5.1 for the hardest coding problems.
Related guides
- Best cheap models for Hermes Agent — full pricing comparison across all five major open source models
- OpenCode setup guide — terminal coding agent that works with any Qwen model
- Best open source models for OpenClaw — model recommendations for self-hosted AI agents
- Hermes Agent setup guide — self-improving AI assistant with Qwen support
FAQ
Is Qwen 3.6 Plus free anywhere?
OpenRouter offers a free tier for Qwen 3.6 Plus with rate limits. OpenCode Go ($10/month) includes it without per-token charges up to the monthly usage cap. Direct from Alibaba, there is no free tier but the per-token pricing is competitive.
Can I run Qwen 3.6 locally?
Yes. Qwen 3.6 27B and Qwen 3.6 35B-A3B are both open-weight models under Apache 2.0. The 27B dense model needs a 24GB GPU or 64GB+ RAM. The 35B-A3B MoE model has only 3B active parameters and runs on much less — even CPU-only at usable speeds for simple tasks. Pull them with Ollama: ollama pull qwen3.6:27b or ollama pull qwen3.6:35b-a3b.
How does Qwen 3.6 Plus compare to Claude Sonnet?
Qwen 3.6 Plus costs $0.33/M input versus Claude Sonnet at roughly $3/M input. That is about 9x cheaper. On coding benchmarks, Qwen 3.6 Plus scores 78.8% on SWE-bench Verified. Claude Sonnet scores higher on some benchmarks, but for the price difference, Qwen 3.6 Plus is the better value for most coding tasks. Use Claude for the hardest problems, Qwen for everything else.
What about Qwen 3.6 Max Preview?
Qwen 3.6 Max Preview is Alibaba’s strongest model, hitting number one on six coding benchmarks. It is closed-weights and only available through Alibaba Cloud and Qwen Studio APIs. It costs more than Plus. For most developers, Plus is the better daily driver. Use Max Preview when you need maximum accuracy on a specific hard problem.
Does Qwen 3.6 work with MCP servers?
Yes. Qwen 3.6 Plus supports function calling and structured output, which is what MCP servers use under the hood. When you connect MCP servers through OpenCode, Hermes Agent, or OpenClaw, Qwen 3.6 Plus handles the tool calls like any other compatible model.
For more model comparisons and AI agent setup guides, check out our AI tools category and the OpenClaw alternatives roundup.