Poster
Geometric Exploitation for Indoor Panoramic Semantic Segmentation
Duc Cao Dinh · Seok Joon Kim · Kyusung Cho
PAnoramic Semantic Segmentation (PASS) is an important task in computer vision, as it enables semantic understanding of a 360° environment. Currently, most of the existing work has focused on addressing the distortion issues in 2D panoramic images without considering the spatial properties of indoor scenes. This restricts PASS methods in capturing contextual features , which are essential for dealing with ambiguity when working with monocular images. Unlike previous works, in this paper, we propose a novel approach for semantic segmentation of panoramic images. We consider the indoor panoramic image as a combination of two sets of segments: over-sampled, representing planar objects such as ceiling and floor, and under-sampled, representing for other elements. We then tailor optimization for these groups by leveraging geometric properties in different strategies as follows: First, we enforce geometrically consistent losses in the joint study of over-sampled segments with the dense depth estimation. Second, we propose to exploit the rich geometric representations of the indoor scene in different methods, aggregating them using a proposed Transformer-based Context Module. Combined with a simple high-resolution branch, it functions as a robust hybrid decoder for the estimating of under-sampled segments. This decoder not only prevents resolution degradation of the predicted masks but also allows the exploitation of relationships between different geometric components of the scene.Experimental results on the Stanford2D3D panoramic (real-world) and Structured3D (synthetic) datasets demonstrate the effectiveness of the proposed approach, by setting a new a state-of-the-art in evaluation on both datasets, achieving approximately 56.8% and 71.7% mIoU, respectively. The code will be shared publicly if the paper is accepted.
Live content is unavailable. Log in and register to view live content