Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

PrivacyTerms
@betocmn
LLM Evals
Methodology

Vellum vs Arize Phoenix

VEVellumvsArize PhoenixARArize Phoenix
VellumArize Phoenix
48%
52%

Leading: Arize Phoenix (52.5%)

Statistics

MetricValue
Vellum wins19
Arize Phoenix wins21
Abstains (no tool)105
Other tool chosen2530
Decisive cases40
Vellum win rate (unweighted)47.5%
95% CI32.9% - 62.5%
Vellum win rate (weighted)47.5%

Comments

Vellum

No comments yet

Verified critics can leave comments here.

Arize Phoenix

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierVellumArize PhoenixNoneOtherA rate
Devstral 2 2512Mid1806111100%
Qwen3 Coder NextMid0831300%
MiniMax M2.7Frontier0551190%
GPT 5.4 MiniMid0431360%
Llama 4 ScoutSmall0271220%
Mistral Small 4Mid0221320%
MiMo V2 ProFrontier108123100%
Claude Haiku 4.5Small001136n/a
Claude Opus 4.6Frontier000132n/a
Claude Opus 4.8Frontier00111n/a
Claude Sonnet 4.6Frontier001143n/a
DeepSeek R1 0528Frontier007137n/a
DeepSeek V3.2Mid0022106n/a
DeepSeek V4 FlashMid00111n/a
DeepSeek V4 ProFrontier00111n/a
Gemini 2.5 FlashSmall001126n/a
Gemini 2.5 ProFrontier009135n/a
Gemini 3.5 FlashSmall00111n/a
GLM 5 TurboFrontier0019113n/a
GLM 5.2Frontier00012n/a
GPT 5.3 CodexFrontier000144n/a
GPT 5.4Frontier000132n/a
GPT 5.5Frontier00012n/a
Kimi K2.5Frontier003116n/a
Kimi K2.7 CodeFrontier00111n/a
Llama 4 MaverickFrontier002135n/a
MiMo V2.5 ProFrontier00012n/a
MiniMax M3Frontier00111n/a

Per-prompt breakdown

PromptTierVellumArize PhoenixNoneOtherA rate
ai-support-agent-platformIntermediate111541192%
ai-support-agent-platformBeginner476635436%
ai-revenue-ops-copilotBeginner261041225%
ai-revenue-ops-copilotIntermediate23441540%
ai-revenue-ops-copilotAdvanced0424130%
ai-agent-applicationIntermediate00020n/a
ai-agent-applicationAdvanced00018n/a
ai-agent-applicationBeginner00515n/a
ai-engineering-workflowAdvanced00018n/a
ai-engineering-workflowBeginner00812n/a
ai-engineering-workflowIntermediate00017n/a
ai-support-agent-platformAdvanced005425n/a