ICLR Poster InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

Spotlight Poster

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

Chenguo Lin · Yadong MU

Halle B

[ Abstract ]

[ OpenReview]

Abstract:

Comprehending natural language instructions is a charming property for 3D indoor scene synthesis systems. Existing methods suffer from directly modeling the object distributions within a scene, thereby hindering the controllability of generation. We introduce InstructScene, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 3D scene synthesis. The proposed semantic graph prior jointly learns indoor scene appearance and layout distributions, exhibiting versatility across various generative tasks. To facilitate the benchmarking for text-driven 3D scene synthesis, we curate a high-quality dataset of scene-instruction pairs with large language and multimodal models. Extensive experimental results reveal that the proposed method surpasses existing state-of-the-art approaches by a large margin. Thorough ablation studies confirm the efficacy of crucial design components. Both our code and dataset will be publicly available after the review period.

Chat is not available.