Poster
In-N-Out: Lifting 2D Diffusion Prior for 3D Object Removal via Tuning-Free Latents Alignment
Dongting Hu · Huan Fu · Jiaxian Guo · Liuhua Peng · Tingjin Chu · Feng Liu · Tongliang Liu · Mingming Gong
Neural representations for 3D scenes have made substantial advancements recently, yet object removal remains a challenging yet practical issue, due to the absence of multi-view supervision over occluded areas. Diffusion Models (DMs), trained on extensive 2D imagesd, show diverse and high-fidelity generative capabilities in 2D domain. However, due to not specifically trained on 3D data, their application to multi-view data often exacerbate inconsistency, hence impacts overall quality of the 3D output. To address these issues, we introduce ``In-N-Out'', a novel approach that begins by \underline{in}painting a prior, i.e., the occluded area from a single view using DMs, followed by \underline{out}stretching it to create multi-view inpaintings via latents alignments. Our analysis identifies that the variability in DMs' outputs mainly arises from initial sampled latents, and intermediate latents predicted in the denoising process. We explicitly align of \textbf{initial} latents using a Neural Radiance Field (NeRF) to establish a consistent foundational structure in the inpainted area, complemented by an implicit alignment of \textbf{intermediate} latents through cross-view attention during the denoising phases, enhancing appearance consistency across views. To further enhance rendering results, we apply a patch-based hybrid loss to optimize NeRF. We demonstrate that our techniques effectively mitigate the challenges posed by inconsistencies in DMs and substantially improve the fidelity and coherence of inpainted 3D representations.
Live content is unavailable. Log in and register to view live content