LLM Evals

LangSmith vs Braintrust

LangSmithBraintrust

53%

47%

Leading: LangSmith (52.8%)

Metric	Value
LangSmith wins	902
Braintrust wins	806
Abstains (no tool)	105
Other tool chosen	862
Decisive cases	1708
LangSmith win rate (unweighted)	52.8%
95% CI	50.4% - 55.2%
LangSmith win rate (weighted)	52.8%

Verified critics can leave comments here.

Verified critics can leave comments here.

Model	Tier	LangSmith	Braintrust	None	Other	A rate
GPT 5.3 Codex	Frontier	8	136	0	0	6%
Claude Haiku 4.5	Small	14	113	1	9	11%
Claude Sonnet 4.6	Frontier	58	65	1	20	47%
DeepSeek R1 0528	Frontier	119	0	7	18	100%
Gemini 2.5 Pro	Frontier	116	1	9	18	99%
Claude Opus 4.6	Frontier	1	113	0	18	1%
GPT 5.4	Frontier	18	92	0	22	16%
Kimi K2.5	Frontier	0	109	3	7	0%
GPT 5.4 Mini	Mid	102	4	3	34	96%
GLM 5 Turbo	Frontier	18	84	19	11	18%
Mistral Small 4	Mid	101	0	2	33	100%
Qwen3 Coder Next	Mid	95	0	3	43	100%
MiniMax M2.7	Frontier	62	31	5	31	67%
DeepSeek V3.2	Mid	80	0	22	26	100%
MiMo V2 Pro	Frontier	62	1	8	61	98%
Llama 4 Maverick	Frontier	22	0	2	113	100%
GPT 5.5	Frontier	0	12	0	0	0%
MiniMax M3	Frontier	0	10	1	1	0%
DeepSeek V4 Pro	Frontier	5	4	1	2	56%
Gemini 3.5 Flash	Small	0	9	1	2	0%
GLM 5.2	Frontier	0	9	0	3	0%
MiMo V2.5 Pro	Frontier	3	5	0	4	38%
Devstral 2 2512	Mid	6	0	6	123	100%
Claude Opus 4.8	Frontier	5	1	1	5	83%
Kimi K2.7 Code	Frontier	0	6	1	5	0%
DeepSeek V4 Flash	Mid	4	1	1	6	80%
Gemini 2.5 Flash	Small	3	0	1	123	100%
Llama 4 Scout	Small	0	0	7	124	n/a

Prompt	Tier	LangSmith	Braintrust	None	Other	A rate
ai-revenue-ops-copilot	Intermediate	189	130	4	101	59%
ai-support-agent-platform	Intermediate	207	92	5	124	69%
ai-revenue-ops-copilot	Advanced	128	164	2	125	44%
ai-revenue-ops-copilot	Beginner	126	151	10	143	45%
ai-support-agent-platform	Advanced	142	130	5	153	52%
ai-support-agent-platform	Beginner	87	100	66	178	47%
ai-engineering-workflow	Advanced	2	12	0	4	14%
ai-agent-application	Beginner	8	3	5	4	73%
ai-agent-application	Intermediate	5	6	0	9	45%
ai-agent-application	Advanced	3	7	0	8	30%
ai-engineering-workflow	Intermediate	2	8	0	7	20%
ai-engineering-workflow	Beginner	3	3	8	6	50%