Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

PrivacyTerms
@betocmn
LLM Evals
Methodology

Ragas vs Braintrust

RagasRARagasvsBraintrustBRBraintrust
RagasBraintrust
13%
87%

Leading: Braintrust (87.4%)

Statistics

MetricValue
Ragas wins116
Braintrust wins806
Abstains (no tool)105
Other tool chosen1648
Decisive cases922
Ragas win rate (unweighted)12.6%
95% CI10.6% - 14.9%
Ragas win rate (weighted)12.6%

Comments

Ragas

No comments yet

Verified critics can leave comments here.

Braintrust

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierRagasBraintrustNoneOtherA rate
GPT 5.3 CodexFrontier0136080%
Claude Opus 4.6Frontier181130114%
Claude Haiku 4.5Small01131230%
Kimi K2.5Frontier1109361%
GPT 5.4Frontier1920391%
GLM 5 TurboFrontier68419237%
Claude Sonnet 4.6Frontier76517110%
MiniMax M2.7Frontier243156944%
MiMo V2 ProFrontier24189996%
GPT 5.4 MiniMid94312769%
GPT 5.5Frontier012000%
GLM 5.2Frontier290118%
MiniMax M3Frontier110109%
Mistral Small 4Mid902125100%
Kimi K2.7 CodeFrontier361233%
Gemini 3.5 FlashSmall09120%
MiMo V2.5 ProFrontier250529%
DeepSeek V4 ProFrontier04170%
Llama 4 ScoutSmall307121100%
Claude Opus 4.8Frontier211867%
DeepSeek V3.2Mid2022104100%
DeepSeek V4 FlashMid111950%
Devstral 2 2512Mid106128100%
Gemini 2.5 ProFrontier0191340%
DeepSeek R1 0528Frontier007137n/a
Gemini 2.5 FlashSmall001126n/a
Llama 4 MaverickFrontier002135n/a
Qwen3 Coder NextMid003138n/a

Per-prompt breakdown

PromptTierRagasBraintrustNoneOtherA rate
ai-revenue-ops-copilotAdvanced916422445%
ai-support-agent-platformAdvanced37130525822%
ai-revenue-ops-copilotBeginner15151102549%
ai-support-agent-platformBeginner341006623125%
ai-revenue-ops-copilotIntermediate213042882%
ai-support-agent-platformIntermediate89253238%
ai-engineering-workflowAdvanced012060%
ai-agent-applicationIntermediate560945%
ai-agent-applicationAdvanced370830%
ai-engineering-workflowIntermediate08090%
ai-agent-applicationBeginner335950%
ai-engineering-workflowBeginner03890%