Spotlight
in
Workshop: Physical Reasoning and Inductive Biases for the Real World
3D-OES: Viewpoint-Invariant Object-FactorizedEnvironment Simulators
Hsiao-Yu Tung · Zhou Xian · Mihir Prabhudesai · Katerina Fragkiadaki
We propose an action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-invariant 3D neural scene representation space, inferred from RGB-D videos. In this 3D feature space, objects do not interfere with one another and their appearance persists over time and across viewpoints. This permits our model to predict future scenes long in the future by simply “moving" 3D object features based on cumulative object motion predictions. Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation. Our model generalizes well across varying number and appearances of interacting objects as well as across camera viewpoints, outperforming existing 2D and 3D dynamics models, and enables successful sim-to-real transfer.