Poster
UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior
Yao Wu · Mingwei Xing · Yachao Zhang · Xiaotong Luo · Yuan Xie · Yanyun Qu
East Exhibit Hall A-C #1606
Abstract:
3D semantic segmentation using an adapting model trained from a source domain with or without accessing unlabeled target-domain data is the fundamental task in computer vision, containing domain adaptation and domain generalization.The essence of simultaneously solving cross-domain tasks is to enhance the generalizability of the encoder.In light of this, we propose a groundbreaking universal method with the help of off-the-shelf Visual Foundation Models (VFMs) to boost the adaptability and generalizability of cross-domain 3D semantic segmentation, dubbed $\textbf{UniDSeg}$.Our method explores the VFMs prior and how to harness them, aiming to inherit the recognition ability of VFMs.Specifically, this method introduces layer-wise learnable blocks to the VFMs, which hinges on alternately learning two representations during training: (i) Learning visual prompt. The 3D-to-2D transitional prior and task-shared knowledge is captured from the prompt space, and then (ii) Learning deep query. Spatial Tunability is constructed to the representation of distinct instances driven by prompts in the query space.Integrating these representations into a cross-modal learning framework, UniDSeg efficiently mitigates the domain gap between 2D and 3D modalities, achieving unified cross-domain 3D semantic segmentation.Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, UniDSeg achieves 57.5\%/54.4\% mIoU on ``A2D2/sKITTI'' for domain adaptive/generalized tasks. Code is available at https://github.com/Barcaaaa/UniDSeg.
Chat is not available.