NeurIPS Poster Exocentric-to-Egocentric Video Generation

Poster

Exocentric-to-Egocentric Video Generation

Jia-Wei Liu · Weijia Mao · Zhongcong XU · Jussi Keppo · Mike Zheng Shou

West Ballroom A-D #7105

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

We introduce Exo2Ego-V, a novel exocentric-to-egocentric diffusion-based video generation method for daily-life skilled human activities where sparse 4-view exocentric viewpoints are configured 360° around the scene. This task is particularly challenging due to the significant variations between exocentric and egocentric viewpoints and high complexity of dynamic motions and real-world daily-life environments. To address these challenges, we first propose a new diffusion-based multi-view exocentric encoder to extract the dense multi-scale features from multi-view exocentric videos as the appearance conditions for egocentric video generation. Then, we design an exocentric-to-egocentric view translation prior to provide spatially aligned egocentric features as a concatenation guidance for the input of egocentric video diffusion model. Finally, we introduce the temporal attention layers into our egocentric video diffusion pipeline to improve the temporal consistency cross egocentric frames. Extensive experiments demonstrate that Exo2Ego-V significantly outperforms SOTA approaches on 5 categories from the Ego-Exo4D dataset with an average of 35% in terms of LPIPS. Our code and model will be made available on https://github.com/showlab/Exo2Ego-V.

Chat is not available.