Poster
Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
Qitao Zhao · Shubham Tulsiani
Inferring the 3D structure underlying a set of multi-view images requires solving two co-dependent tasks -- accurate 3D reconstruction requires precise camera poses, and predicting camera poses relies on (implicitly or explicitly) modeling the underlying 3D. The classical framework of analysis by synthesis casts this inference as a joint optimization seeking to explain the observed pixels, and recent instantiations typically learn expressive 3D representations (e.g., Neural Fields) with gradient-descent-based pose refinement of off-the-shelf pose estimates. However, given a sparse set of observed views, the observations may not provide sufficient direct evidence to obtain complete and accurate 3D. Moreover, large errors in pose estimation may not be easily corrected and can further degrade the inferred 3D. To allow robust 3D reconstruction and pose estimation in this challenging setup, we propose a method that adapts this analysis-by-synthesis approach by: a) including novel-view-synthesis-based generative priors in conjunction with photometric objectives to improve the quality of the inferred 3D, and b) explicit reasoning about outliers and a discrete search and continuous optimization-based strategy to correct them. We validate our framework across real-world and synthetic datasets in conjunction with several off-the-shelf pose estimation systems as initialization. We find that it significantly improves the base systems' pose accuracy while yielding high-quality 3D reconstructions that outperform the results from current unposed multi-view reconstruction baselines.
Live content is unavailable. Log in and register to view live content