EvalAlign: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
[Arxiv] [Code] [Model] [Dataset] [Page] [Leaderboard] [BibTeX]
Model | Image Faithfulness (Human) | Image Faithfulness (EvalAlign) | Text-image Alignment (Human) | Text-image Alignment (EvalAlign) |
---|---|---|---|---|
PixArt-XL-2-1024-MS | 2.28481 | 1.64151 | 5.11005 | 5.31007 |
Dreamlike Photoreal v2.0 | 2.00702 | 1.45224 | 4.560017 | 4.980013 |
SDXL Refiner v1.0 | 1.92293 | 1.60722 | 5.21003 | 5.40003 |
SDXL v1.0 | 1.81364 | 1.46753 | 5.03008 | 5.35004 |
Wuerstchen | 1.78375 | 1.42795 | 4.87009 | 5.17005 |
LCM SDXL | 1.69106 | 1.33917 | 5.18004 | 5.33006 |
Openjourney | 1.66677 | 1.175010 | 4.830010 | 4.920015 |
Safe SD MAX | 1.64918 | 1.21758 | 4.310023 | 4.590023 |
LCM LoRA SDXL | 1.63879 | 1.38336 | 5.06007 | 5.27008 |
Safe SD STRONG | 1.630810 | 1.146611 | 4.600016 | 4.830017 |
Safe SD MEDIUM | 1.627511 | 1.129815 | 4.400021 | 4.560024 |
Safe SD WEAK | 1.607812 | 1.118817 | 4.530018 | 4.710020 |
SD v2.1 | 1.552413 | 1.109418 | 4.800011 | 5.070011 |
SD v2.0 | 1.527714 | 1.130014 | 4.640014 | 5.010012 |
Openjourney v2 | 1.500015 | 0.995620 | 4.150024 | 4.650022 |
Redshift Diffusion | 1.473316 | 1.138212 | 4.350022 | 4.670021 |
Dreamlike Diffusion v1.0 | 1.465217 | 1.20529 | 4.660013 | 5.150010 |
SD v1.5 | 1.441718 | 1.136213 | 4.450020 | 4.900016 |
IF-I-XL v1.0 | 1.380819 | 0.922122 | 5.45001 | 5.53001 |
SD v1.4 | 1.359220 | 0.951121 | 4.520019 | 4.760019 |
Vintedois Diffusion v0.1 | 1.356221 | 1.079719 | 4.620015 | 4.950014 |
IF-I-L v1.0 | 1.263522 | 0.881423 | 5.23002 | 5.45002 |
MultiFusion | 1.237223 | 1.129816 | 4.680012 | 4.800018 |
IF-I-M v1.0 | 1.013524 | 0.792824 | 5.08006 | 5.22009 |