AI Model Configuration

Claude Fable 5 — Anthropic's most capable model, frontier reasoning with a 1M-token context; available for the reasoning tier
Claude Opus 4.8 — top Opus-tier model, default for the reasoning tier
Claude Opus 4.7 — reasoning-class, available for the reasoning tier
Claude Opus 4.6 — reasoning-class, available for the reasoning tier
Claude Sonnet 4.6 — balanced performance, used for standard and fast tiers
Claude Haiku 4.5 — fast and efficient, used for micro tier

DeepSeek

Alternative provider, V4 generation (released April 2026):

DeepSeek V4 Pro — frontier reasoning, GPT-5 / Gemini-3.0-Pro tier — sits at the reasoning tier as an Opus alternative
DeepSeek V4 Flash — general workhorse with 1M context and thinking + tool-use in both modes — sits at the standard / fast tier as a Sonnet alternative

Moonshot (Kimi)

Kimi K2.6 — latest flagship, 262k context
Kimi K2.5 — reasoning-capable model

Z.ai (GLM)

Zhipu GLM models via Z.ai's Claude Code integration:

GLM 5.1 — flagship agentic coder
GLM 5 — reasoning-class
GLM 5 Turbo — faster agent-tier model
GLM 4.7 — agentic coder
GLM 4.6 — reasoning-class
GLM 4.5 Air — faster, lower-intelligence variant

Qwen (Alibaba Cloud)

Qwen3 family via Alibaba DashScope's Claude Code integration:

Qwen3.7 Max — frontier flagship (1M context, native thinking)
Qwen3.7 Plus — multimodal agent flagship
Qwen3.6 Plus — recommended-default general (1M context, multimodal)
Qwen3 Coder Plus — flagship coder (context cache, tiered billing)
Qwen3.5 Plus — general flagship (1M context, multimodal)
Qwen3 Max — Max-series, maximum quality for complex reasoning
Qwen3 Coder Next — balanced coder (262k context, agentic tool calling)
Qwen3 Coder Flash — fast coder
Qwen3.6 Flash — recommended-default fast general (1M context, multimodal)
Qwen3.5 Flash — fast general (1M context, multimodal)

Ollama

Local model support for private, on-machine execution:

Qwen3 Coder Next — code-focused model
Qwen3.6 27B — current-gen general/coding model (256k context)
Qwen3 Coder — code-focused model
GLM 4.7 — general-purpose
DeepSeek Coder — code-focused
Qwen 3.5 9B — small general-purpose model

OpenAI (via Codex)

GPT-5 models served through the Codex harness (authenticated via the Codex CLI, not an API key):

GPT-5.5 — frontier reasoning, sits at the reasoning tier
GPT-5.4 — general workhorse for the standard / fast tiers
GPT-5.4 Mini — fast, lower-cost variant for the fast / micro tiers

Configuring Models

Navigate to Settings
Find the AI Models section
For each tier, pick a harness → provider → model cascade:
- Harness — the CLI/runtime that actually executes the agent. Choose Claude Code (which talks to every Anthropic-compat provider below — Anthropic, DeepSeek, Moonshot, Z.ai, Qwen, Ollama) or Codex (OpenAI's GPT-5 models).
- Provider — which vendor runs the model.
- Model — the specific model within that provider.

Switching providers restores the last model you picked for that (harness, provider) pair, so you can A/B between two providers without losing your selection. Settings are stored as harness:provider:model strings (e.g., claude-code:anthropic:claude-opus-4-8).

Defaults

Tier	Default Model
Reasoning	Claude Opus 4.8
Standard	Claude Sonnet 4.6
Fast	Claude Sonnet 4.6
Micro	Claude Haiku 4.5

Retired models appear greyed out in the picker and aren't selectable. Stored settings that reference them keep working (and appear correctly in historical metrics) until you change them.

Dynamic Tier Resolution

Some operations dynamically choose between tiers based on context:

Story execution — stories with difficulty 4+ or large surface area automatically use the reasoning tier instead of standard
Planning pipeline — the architect phase uses reasoning, while calibrator and dependency-mapper use fast

This means a low-difficulty story costs less than a high-difficulty one, even though both go through the same pipeline.

Tier Default Resolution

Trinity picks which model to use in this priority order (highest wins):

Explicit — if the caller passes a specific model, that wins
Entity / job override — a model carried on a story's or release's automation overrides, or one passed when starting a run. There's no per-story/release model picker in the UI (the story Automation tab covers PR/merge toggles, not model choice), so this layer is set programmatically and rarely by hand
User per-project — your Project Settings → Mine model choice
Project default — the project's configured tier model
User per-scope — your Team Settings → Mine model choice
Team default — the team's configured tier model
Global default — your personal Settings → AI Models choice
TIER_FALLBACK_MODEL_ID — built-in Anthropic defaults used when no layer above has been configured

This is resolution-time only — it's not a runtime "retry with a fallback on failure." If a model call fails, the operation surfaces the error (the caller handles retries, usually by going through the feedback pipeline or job-level retry).

Effort Levels

Recent Anthropic models accept an effort parameter that controls how hard the model thinks before answering. Each model supports a different subset of the ladder:

Fable 5, Opus 4.8, Opus 4.7 — full 5-level: low | medium | high | xhigh | max
Opus 4.6, Sonnet 4.6 — 4-level: low | medium | high | max (no xhigh)
Haiku 4.5 — no effort parameter; the harness skips injecting it

Trinity's harness clamps every requested effort to what the chosen model supports, so a xhigh request against Sonnet 4.6 lands at high. Effort requests get recorded in the ai_events table for metrics.

Timeout Tiers

Each model tier has associated timeout limits:

Tier	Timeout
Micro	5 minutes
Fast (Short)	15 minutes
Standard (Default)	30 minutes
Reasoning (Long)	1 hour

Operations that exceed their timeout are cancelled and marked as failed.

Cost Considerations

Model costs vary significantly:

Reasoning tier is the most expensive — use it only where quality matters (Trinity does this automatically for hard stories)
Standard tier offers the best quality-to-cost ratio for most work
Fast tier is cost-effective for bounded tasks that don't need deep reasoning
Micro tier is very cheap, used for mechanical classification

The Metrics dashboard tracks token usage and cost by operation, helping you understand where your budget goes.

Tips

Start with defaults — the default configuration is well-balanced for most projects
Use DeepSeek for cost savings — if you have a DeepSeek API key, using it for the fast or standard tier can reduce costs significantly
Use Ollama for privacy — local models keep all data on your machine, but expect slower execution and potentially lower quality
Monitor the cost tab — check Metrics → Cost to understand your spending patterns before making changes
Don't downgrade reasoning — the reasoning tier handles your most complex stories; using a less capable model here leads to more failures and retries, which can cost more in the end