Skip to yearly menu bar Skip to main content


Poster

Cameras as Rays: Sparse-view Pose Estimation via Ray Diffusion

Jason Zhang · Amy Lin · MONEISH KUMAR · Tzu-Hsuan Yang · Deva Ramanan · Shubham Tulsiani

Halle B
[ ]
Wed 8 May 1:45 a.m. PDT — 3:45 a.m. PDT
 
Oral presentation: Oral 7B
Fri 10 May 1 a.m. PDT — 1:45 a.m. PDT

Abstract: Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparse views ($<$10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. This representation allows for a tight coupling with spatial image features improving pose precision. We observe that this representation is naturally suited for set-level level transformers and develop a regression-based approach that maps image patches to corresponding rays. To capture the inherent uncertainties in sparse-view pose inference, we adapt this approach to learn a denoising diffusion model which allows us to sample plausible modes while improving performance. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures.

Chat is not available.