ICLR 2024 SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Spotlight

Spotlight

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell · Zion English · Kyle Lacey · Andreas Blattmann · Tim Dockhorn · Jonas Müller · Joe Penna · Robin Rombach

[ Abstract ]

[ OpenReview]

Abstract:

We present Stable Diffusion XL (SDXL), a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone, achieved by significantly increasing the number of attention blocks and including a second text encoder. Further, we design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. To ensure highest quality results, we also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL improves dramatically over previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators such as Midjourney.

Chat is not available.