Poster
Emerging Pixel-level Semantic Knowledge in Diffusion Models
Koichi Namekata · Amirmojtaba Sabour · Sanja Fidler · Seung Wook Kim
Halle B
Diffusion models have recently received increasing research attention for their impressive transfer abilities to semantic segmentation tasks. However, previous works rely on additional supervision to produce fine-grained segmentation maps, leaving it unclear how much diffusion models alone understand the semantic relations of their generated images. To help answer this question, we exploit the semantic knowledge extracted from Stable Diffusion (SD) and build an image segmentor that can generate fine-grained segmentation maps without any additional training. The major issue that makes this task challenging for previous works is that semantically meaningful feature maps usually exist only in the spatially lower-dimensional layers, which makes it infeasible to extract pixel-level semantic relations directly from the feature maps. To overcome this challenge, our framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps by analyzing SD’s generation process and utilizes them to construct image-resolution segmentation maps. In extensive experiments, the produced segmentation maps are shown to be well delineated and capture detailed parts of the images, indicating the existence of highly accurate pixel-level semantic knowledge in the diffusion models