Use case

Speech Generation

About this leaderboard

Benchmarks TTS models for intelligibility, natural prosody, speaker similarity, noise robustness, latency, and hallucination-free rendering across multiple voices and languages.

We stress-test models with curated simulations that blend structured benchmarks and open-ended prompts. Each table captures a distinct slice of the use case, and models are compared consistently across shared metrics to surface leaders, trade-offs, and surprising strengths.

Overall

Rank Model Elo Rating True Skill Rating Average Rank Rank 1 Percentage
1 Cartesia Sonic 2 1129.46 1075.99 1.3 70.1
2 OpenAI TTS 1118.15 1035.23 1.29 70.91
3 Cartesia Sonic 1 1074.51 1036.74 1.32 68.17
4 ElevenLabs 1062.1 991 1.42 58.18
5 AWS Polly 1056.84 985.48 1.41 59.34
6 Kokoro 989.28 978.48 1.58 41.52
7 Google TTS 940.31 869.29 1.87 13.21
8 XTTS V2 872.43 845.35 1.73 27.4
9 Deepgram 756.92 823.46 1.59 40.61