Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

PrivacyTerms
@betocmn
LLM Evals
Methodology

Patronus AI vs LMSYS Chatbot Arena

Patronus AIPAPatronus AIvsLMLMSYS Chatbot Arena
Patronus AILMSYS Chatbot Arena
50%
50%
Insufficient data
This matchup has 14 decisive cases (minimum 30 required for publication).

Statistics

MetricValue
Patronus AI wins7
LMSYS Chatbot Arena wins7
Abstains (no tool)105
Other tool chosen2556
Decisive cases14
Patronus AI win rate (unweighted)50.0%
95% CI26.8% - 73.2%
Patronus AI win rate (weighted)50.0%

Comments

Patronus AI

No comments yet

Verified critics can leave comments here.

LMSYS Chatbot Arena

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierPatronus AILMSYS Chatbot ArenaNoneOtherA rate
Llama 4 ScoutSmall0771170%
MiMo V2 ProFrontier408120100%
Gemini 2.5 ProFrontier209133100%
GPT 5.4Frontier100131100%
Claude Haiku 4.5Small001136n/a
Claude Opus 4.6Frontier000132n/a
Claude Opus 4.8Frontier00111n/a
Claude Sonnet 4.6Frontier001143n/a
DeepSeek R1 0528Frontier007137n/a
DeepSeek V3.2Mid0022106n/a
DeepSeek V4 FlashMid00111n/a
DeepSeek V4 ProFrontier00111n/a
Devstral 2 2512Mid006129n/a
Gemini 2.5 FlashSmall001126n/a
Gemini 3.5 FlashSmall00111n/a
GLM 5 TurboFrontier0019113n/a
GLM 5.2Frontier00012n/a
GPT 5.3 CodexFrontier000144n/a
GPT 5.4 MiniMid003140n/a
GPT 5.5Frontier00012n/a
Kimi K2.5Frontier003116n/a
Kimi K2.7 CodeFrontier00111n/a
Llama 4 MaverickFrontier002135n/a
MiMo V2.5 ProFrontier00012n/a
MiniMax M2.7Frontier005124n/a
MiniMax M3Frontier00111n/a
Mistral Small 4Mid002134n/a
Qwen3 Coder NextMid003138n/a

Per-prompt breakdown

PromptTierPatronus AILMSYS Chatbot ArenaNoneOtherA rate
ai-support-agent-platformAdvanced23542040%
ai-revenue-ops-copilotAdvanced402413100%
ai-revenue-ops-copilotBeginner02104180%
ai-support-agent-platformIntermediate0254210%
ai-support-agent-platformBeginner1066364100%
ai-agent-applicationIntermediate00020n/a
ai-agent-applicationAdvanced00018n/a
ai-agent-applicationBeginner00515n/a
ai-engineering-workflowAdvanced00018n/a
ai-engineering-workflowBeginner00812n/a
ai-engineering-workflowIntermediate00017n/a
ai-revenue-ops-copilotIntermediate004420n/a