Leaderboard

Elite models per use case — updated from the latest benchmarking API.

Use case

Complex Reasoning

🧠
OpenAI GPT-5 92.20%
xAI Grok 4 81.80%
OpenAI o3 75.30%

Use case

Constraint Bench

🧯
OpenAI GPT-5 79.20%
OpenAI o3 57.20%
Grok 4 33.30%

Use case

Video Generation

🎬
Veo 3 (w/o audio) 1267.99
Veo 2 1135.46
Runway Gen 3 1038.65

Use case

Image Generation

🖼️
GPT Image 1 1069.17
GPT 4.1 1039.62
Recraft v3 1039.37

Use case

Speech Generation

🎙️
Cartesia Sonic 2 1129.46
OpenAI TTS 1118.15
Cartesia Sonic 1 1074.51

Use case

Multimodal Reasoning

🛰️
Claude 3.5 Sonnet 1220.00
Gemini 2.0 Flash 1119.62
O1 1105.56

Use case

Agentic Search

🔎
OpenAI GPT-4.1 (Search) 82.00%
Google Gemini 2.5 Pro (Search) 78.00%
Anthropic Claude 4.0 Opus (Search) 72.00%

Use case

Deep Research Agent

🧭
Google Gemini 2.5 Pro 77.55%
OpenAI GPT-4 71.46%
Anthropic Claude 4.0 Opus 69.05%