Use case

Image Generation

← Back to leaderboard

About this leaderboard

Ranks still-image models on fidelity, composition, text rendering, fine-grain details, and safety/NSFW defenses using aesthetic raters and pairwise preference/Elo comparisons.

We stress-test models with curated simulations that blend structured benchmarks and open-ended prompts. Each table captures a distinct slice of the use case, and models are compared consistently across shared metrics to surface leaders, trade-offs, and surprising strengths.

Overall

Rank	Model	Elo Rating	Trust Skill Rating	Average Rank	Rank 1 Percentage
1	GPT Image 1	1069.17	982.86	1.41	59.31
2	GPT 4.1	1039.62	979.89	1.4	60.14
3	Recraft v3	1039.37	959.04	1.63	36.52
4	Imagen 3	1024.05	963.71	1.46	53.78
5	flux_image	1008.33	967.36	1.54	45.82
6	DALL·E 3	976.28	913.58	1.45	55.25
7	Ideogram 2.0	939.63	910.42	1.55	44.7
8	Stable Diffusion 3	903.55	895.02	1.55	45.21