ICLR 2024 DyST: Towards Dynamic Neural Scene Representations on Real-World Videos Spotlight

Spotlight

DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Maximilian Seitzer · Sjoerd van Steenkiste · Thomas Kipf · Klaus Greff · Mehdi S. M. Sajjadi

[ Abstract ]

[ OpenReview]

Abstract:

Visual understanding of our world goes beyond the semantics and flat structure of individual images.In this paper, we work towards capturing both the 3D structure as well as the dynamics of real-world scenes from monocular real-world videos.Our model, the Dynamic Scene Transformer (DyST), builds upon recent work in neural scene representation and learns a latent decomposition into scene content as well as per-view scene dynamics and camera pose. This separation is achieved through a special co-training scheme on monocular videos and our new synthetic dataset DySO.DyST learns tangible latent representations for dynamic scenes that enable view generation with separate control over the camera and the content of the scene.

Chat is not available.