Skip to yearly menu bar Skip to main content


Poster

Object-Aware Inversion and Reassembly for Image Editing

Zhen Yang · Ganggui Ding · Wen Wang · Hao Chen · Bohan Zhuang · Chunhua Shen

Halle B
[ ]
Thu 9 May 1:45 a.m. PDT — 3:45 a.m. PDT

Abstract:

Diffusion-based image editing methods have achieved remarkable advances in text-driven image editing. The editing task aims to edit an input image with the original prompt to the desired image aligned with the target prompt. By comparing the original and target prompts, we can obtain numerous editing pairs, each comprising an object and its corresponding editing target. To allow editability while maintaining fidelity to the input image, existing editing methods typically involve a fixed number of inversion steps that project the whole input image to its noisier latent representation, followed by a denoising process guided by the target prompt. However, we find that the optimal number of inversion steps for achieving ideal editing results varies significantly among different editing pairs, owing to varying editing difficulties. Therefore, the current literature, which relies on a fixed number of inversion steps, produces suboptimal generation quality, especially when handling multiple editing pairs in a natural image.To this end, we propose a new image editing paradigm, dubbed Object-aware Inversion and Reassembly (OIR), to enable object-level fine-grained editing. Specifically, we design a new search metric, which determines the optimal inversion step for each editing pair, by jointly considering the editability of the target and the fidelity of the non-editing region. When editing an image, we first search the optimal inversion step for each editing pair with our search metric and edit them separately to circumvent concept mismatch. Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image. To systematically evaluate the effectiveness of our method, we collect two datasets for benchmarking single- and multi-object editing, respectively. Experiments demonstrate that our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.

Chat is not available.