Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

PrivacyTerms
@betocmn
LLM Evals
Methodology

DeepEval vs Arize AI

DeepEvalDEDeepEvalvsARArize AI
DeepEvalArize AI
49%
51%

Leading: Arize AI (51.2%)

Statistics

MetricValue
DeepEval wins103
Arize AI wins108
Abstains (no tool)105
Other tool chosen2359
Decisive cases211
DeepEval win rate (unweighted)48.8%
95% CI42.2% - 55.5%
DeepEval win rate (weighted)48.8%

Comments

DeepEval

No comments yet

Verified critics can leave comments here.

Arize AI

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierDeepEvalArize AINoneOtherA rate
Llama 4 MaverickFrontier841323887%
Llama 4 ScoutSmall0417830%
Gemini 2.5 FlashSmall0271990%
Gemini 2.5 ProFrontier01091250%
GPT 5.4Frontier900123100%
Devstral 2 2512Mid0961200%
DeepSeek V4 FlashMid4017100%
Kimi K2.5Frontier403112100%
DeepSeek R1 0528Frontier0371340%
Mistral Small 4Mid0321310%
DeepSeek V3.2Mid1022105100%
DeepSeek V4 ProFrontier10110100%
MiniMax M2.7Frontier0151230%
Qwen3 Coder NextMid0131370%
Claude Haiku 4.5Small001136n/a
Claude Opus 4.6Frontier000132n/a
Claude Opus 4.8Frontier00111n/a
Claude Sonnet 4.6Frontier001143n/a
Gemini 3.5 FlashSmall00111n/a
GLM 5 TurboFrontier0019113n/a
GLM 5.2Frontier00012n/a
GPT 5.3 CodexFrontier000144n/a
GPT 5.4 MiniMid003140n/a
GPT 5.5Frontier00012n/a
Kimi K2.7 CodeFrontier00111n/a
MiMo V2 ProFrontier008124n/a
MiMo V2.5 ProFrontier00012n/a
MiniMax M3Frontier00111n/a

Per-prompt breakdown

PromptTierDeepEvalArize AINoneOtherA rate
ai-support-agent-platformAdvanced3318537465%
ai-support-agent-platformBeginner22196632454%
ai-support-agent-platformIntermediate2120538251%
ai-revenue-ops-copilotAdvanced1023238430%
ai-revenue-ops-copilotBeginner7151039832%
ai-revenue-ops-copilotIntermediate512440329%
ai-agent-applicationAdvanced30015100%
ai-engineering-workflowIntermediate1101550%
ai-engineering-workflowBeginner10811100%
ai-agent-applicationBeginner00515n/a
ai-agent-applicationIntermediate00020n/a
ai-engineering-workflowAdvanced00018n/a