Skip to yearly menu bar Skip to main content


Oral

Oral Session 4D: Machine Vision

West Meeting Room 211-214
Thu 12 Dec 3:30 p.m. PST — 4:30 p.m. PST
Abstract:
Chat is not available.

Thu 12 Dec. 15:30 - 15:50 PST

GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation

Junhao Cai · Yuji Yang · Weihao Yuan · Yisheng HE · Zilong Dong · Liefeng Bo · Hui Cheng · Qifeng Chen

This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to render object masks as 2D shape surrogates during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuum. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as 2D-shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility. Our project page is at https://jukgei.github.io/project/gic.

Thu 12 Dec. 15:50 - 16:10 PST

MeshFormer : High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Minghua Liu · Chong Zeng · Xinyue Wei · Ruoxi Shi · Linghao Chen · Chao Xu · Mengqi Zhang · Zhaoning Wang · Xiaoshuai Zhang · Isabella Liu · Hongzhi Wu · Hao Su

Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. Specifically, instead of using a triplane representation, we store features in 3D sparse voxels and combine transformers with 3D convolutions to leverage an explicit 3D structure and projective bias. In addition to sparse-view RGB input, we require the network to take input and generate corresponding normal maps. The input normal maps can be predicted by 2D diffusion models, significantly aiding in the guidance and refinement of the geometry's learning. Moreover, by combining Signed Distance Function (SDF) supervision with surface rendering, we directly learn to generate high-quality meshes without the need for complex multi-stage training processes. By incorporating these explicit 3D biases, MeshFormer can be trained efficiently and deliver high-quality textured meshes with fine-grained geometric details. It can also be integrated with 2D diffusion models to enable fast single-image-to-3D and text-to-3D tasks. Videos are available at https://meshformer3d.github.io/

Thu 12 Dec. 16:10 - 16:30 PST

E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

Jiaqing Zhang · Mingxiang Cao · Weiying Xie · Jie Lei · Daixun Li · Wenbo Huang · Yunsong Li · Xue Yang

Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high performance with a single training phase. It employs synchronous joint optimization across components to avoid suboptimal solutions associated to individual tasks. Furthermore, it implements a comprehensive optimization strategy in the gradient matrix for shared parameters, ensuring convergence to an optimal fusion detection configuration. Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities, showcasing not only visually appealing image fusion but also impressive detection outcomes, such as a 3.9\% and 2.0\% $\text{mAP}_{50}$ increase on horizontal object detection dataset M3FD and oriented object detection dataset DroneVehicle, respectively, compared to state-of-the-art approaches.