Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

PrivacyTerms
@betocmn
LLM Evals
Methodology

Arize Phoenix vs Promptfoo

Arize PhoenixARArize PhoenixvsPromptfooPRPromptfoo
Arize PhoenixPromptfoo
40%
60%

Leading: Promptfoo (59.6%)

Statistics

MetricValue
Arize Phoenix wins21
Promptfoo wins31
Abstains (no tool)105
Other tool chosen2518
Decisive cases52
Arize Phoenix win rate (unweighted)40.4%
95% CI28.2% - 53.9%
Arize Phoenix win rate (weighted)40.4%

Comments

Arize Phoenix

No comments yet

Verified critics can leave comments here.

Promptfoo

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierArize PhoenixPromptfooNoneOtherA rate
Mistral Small 4Mid27212522%
Qwen3 Coder NextMid803130100%
GPT 5.4 MiniMid42313467%
MiMo V2 ProFrontier0681180%
MiniMax M2.7Frontier505119100%
GLM 5 TurboFrontier05191080%
Llama 4 ScoutSmall207122100%
Gemini 3.5 FlashSmall02190%
Kimi K2.5Frontier0231140%
Kimi K2.7 CodeFrontier02190%
MiMo V2.5 ProFrontier020100%
Claude Opus 4.8Frontier011100%
DeepSeek V4 ProFrontier011100%
GLM 5.2Frontier010110%
Claude Haiku 4.5Small001136n/a
Claude Opus 4.6Frontier000132n/a
Claude Sonnet 4.6Frontier001143n/a
DeepSeek R1 0528Frontier007137n/a
DeepSeek V3.2Mid0022106n/a
DeepSeek V4 FlashMid00111n/a
Devstral 2 2512Mid006129n/a
Gemini 2.5 FlashSmall001126n/a
Gemini 2.5 ProFrontier009135n/a
GPT 5.3 CodexFrontier000144n/a
GPT 5.4Frontier000132n/a
GPT 5.5Frontier00012n/a
Llama 4 MaverickFrontier002135n/a
MiniMax M3Frontier00111n/a

Per-prompt breakdown

PromptTierArize PhoenixPromptfooNoneOtherA rate
ai-revenue-ops-copilotBeginner641041060%
ai-revenue-ops-copilotAdvanced46240740%
ai-support-agent-platformBeginner726635678%
ai-revenue-ops-copilotIntermediate36441133%
ai-support-agent-platformIntermediate15541717%
ai-engineering-workflowBeginner03890%
ai-engineering-workflowIntermediate020150%
ai-agent-applicationIntermediate010190%
ai-engineering-workflowAdvanced010170%
ai-support-agent-platformAdvanced0154240%
ai-agent-applicationAdvanced00018n/a
ai-agent-applicationBeginner00515n/a