Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

PrivacyTerms
@betocmn
LLM Evals
Methodology

Arize AI vs Ragas

ARArize AIvsRagasRARagas
Arize AIRagas
48%
52%

Leading: Ragas (51.8%)

Statistics

MetricValue
Arize AI wins108
Ragas wins116
Abstains (no tool)105
Other tool chosen2346
Decisive cases224
Arize AI win rate (unweighted)48.2%
95% CI41.8% - 54.7%
Arize AI win rate (weighted)48.2%

Comments

Arize AI

No comments yet

Verified critics can leave comments here.

Ragas

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierArize AIRagasNoneOtherA rate
Llama 4 ScoutSmall41378093%
Gemini 2.5 FlashSmall270199100%
MiniMax M2.7Frontier1245994%
MiMo V2 ProFrontier02481000%
Claude Opus 4.6Frontier01801140%
Llama 4 MaverickFrontier1302122100%
Mistral Small 4Mid39212225%
Gemini 2.5 ProFrontier1009125100%
Devstral 2 2512Mid91611990%
GPT 5.4 MiniMid0931310%
Claude Sonnet 4.6Frontier0711360%
GLM 5 TurboFrontier06191070%
DeepSeek R1 0528Frontier307134100%
Kimi K2.7 CodeFrontier03180%
Claude Opus 4.8Frontier02190%
DeepSeek V3.2Mid02221040%
GLM 5.2Frontier020100%
MiMo V2.5 ProFrontier020100%
Qwen3 Coder NextMid103137100%
DeepSeek V4 FlashMid011100%
GPT 5.4Frontier0101310%
Kimi K2.5Frontier0131150%
MiniMax M3Frontier011100%
Claude Haiku 4.5Small001136n/a
DeepSeek V4 ProFrontier00111n/a
Gemini 3.5 FlashSmall00111n/a
GPT 5.3 CodexFrontier000144n/a
GPT 5.5Frontier00012n/a

Per-prompt breakdown

PromptTierArize AIRagasNoneOtherA rate
ai-support-agent-platformAdvanced1837537033%
ai-support-agent-platformBeginner19346631236%
ai-revenue-ops-copilotAdvanced239238572%
ai-revenue-ops-copilotBeginner15151039050%
ai-support-agent-platformIntermediate208539571%
ai-revenue-ops-copilotIntermediate122440686%
ai-agent-applicationIntermediate050150%
ai-agent-applicationAdvanced030150%
ai-agent-applicationBeginner035120%
ai-engineering-workflowIntermediate10016100%
ai-engineering-workflowAdvanced00018n/a
ai-engineering-workflowBeginner00812n/a