LLM Evals

Braintrust vs LangSmith

BraintrustLangSmith

47%

53%

Leading: LangSmith (52.8%)

Metric	Value
Braintrust wins	806
LangSmith wins	902
Abstains (no tool)	105
Other tool chosen	862
Decisive cases	1708
Braintrust win rate (unweighted)	47.2%
95% CI	44.8% - 49.6%
Braintrust win rate (weighted)	47.2%

Verified critics can leave comments here.

Verified critics can leave comments here.

Model	Tier	Braintrust	LangSmith	None	Other	A rate
GPT 5.3 Codex	Frontier	136	8	0	0	94%
Claude Haiku 4.5	Small	113	14	1	9	89%
Claude Sonnet 4.6	Frontier	65	58	1	20	53%
DeepSeek R1 0528	Frontier	0	119	7	18	0%
Gemini 2.5 Pro	Frontier	1	116	9	18	1%
Claude Opus 4.6	Frontier	113	1	0	18	99%
GPT 5.4	Frontier	92	18	0	22	84%
Kimi K2.5	Frontier	109	0	3	7	100%
GPT 5.4 Mini	Mid	4	102	3	34	4%
GLM 5 Turbo	Frontier	84	18	19	11	82%
Mistral Small 4	Mid	0	101	2	33	0%
Qwen3 Coder Next	Mid	0	95	3	43	0%
MiniMax M2.7	Frontier	31	62	5	31	33%
DeepSeek V3.2	Mid	0	80	22	26	0%
MiMo V2 Pro	Frontier	1	62	8	61	2%
Llama 4 Maverick	Frontier	0	22	2	113	0%
GPT 5.5	Frontier	12	0	0	0	100%
MiniMax M3	Frontier	10	0	1	1	100%
Gemini 3.5 Flash	Small	9	0	1	2	100%
GLM 5.2	Frontier	9	0	0	3	100%
DeepSeek V4 Pro	Frontier	4	5	1	2	44%
MiMo V2.5 Pro	Frontier	5	3	0	4	63%
Kimi K2.7 Code	Frontier	6	0	1	5	100%
Claude Opus 4.8	Frontier	1	5	1	5	17%
Devstral 2 2512	Mid	0	6	6	123	0%
DeepSeek V4 Flash	Mid	1	4	1	6	20%
Gemini 2.5 Flash	Small	0	3	1	123	0%
Llama 4 Scout	Small	0	0	7	124	n/a

Prompt	Tier	Braintrust	LangSmith	None	Other	A rate
ai-revenue-ops-copilot	Intermediate	130	189	4	101	41%
ai-support-agent-platform	Intermediate	92	207	5	124	31%
ai-revenue-ops-copilot	Advanced	164	128	2	125	56%
ai-revenue-ops-copilot	Beginner	151	126	10	143	55%
ai-support-agent-platform	Advanced	130	142	5	153	48%
ai-support-agent-platform	Beginner	100	87	66	178	53%
ai-engineering-workflow	Advanced	12	2	0	4	86%
ai-agent-application	Intermediate	6	5	0	9	55%
ai-agent-application	Beginner	3	8	5	4	27%
ai-engineering-workflow	Intermediate	8	2	0	7	80%
ai-agent-application	Advanced	7	3	0	8	70%
ai-engineering-workflow	Beginner	3	3	8	6	50%