Use
case
Speech Generation
About this
leaderboard
Benchmarks TTS models for intelligibility, natural prosody, speaker similarity, noise robustness, latency, and hallucination-free rendering across multiple voices and languages.
We stress-test models with curated simulations that blend structured benchmarks and
open-ended prompts. Each table captures a distinct slice of the use case, and models are
compared consistently across shared metrics to surface leaders, trade-offs, and surprising
strengths.
|
Rank
|
Model
|
Elo Rating
|
True Skill Rating
|
Average Rank
|
Rank 1 Percentage
|
|
1
|
Cartesia Sonic 2
|
1129.46
|
1075.99
|
1.3
|
70.1
|
|
2
|
OpenAI TTS
|
1118.15
|
1035.23
|
1.29
|
70.91
|
|
3
|
Cartesia Sonic 1
|
1074.51
|
1036.74
|
1.32
|
68.17
|
|
4
|
ElevenLabs
|
1062.1
|
991
|
1.42
|
58.18
|
|
5
|
AWS Polly
|
1056.84
|
985.48
|
1.41
|
59.34
|
|
6
|
Kokoro
|
989.28
|
978.48
|
1.58
|
41.52
|
|
7
|
Google TTS
|
940.31
|
869.29
|
1.87
|
13.21
|
|
8
|
XTTS V2
|
872.43
|
845.35
|
1.73
|
27.4
|
|
9
|
Deepgram
|
756.92
|
823.46
|
1.59
|
40.61
|