Class-Agnostic 3D Mask Proposal and 2D-3D Visual Feature Ensemble for Efficient Open-Vocabulary 3D Instance Segmentation 


Vol. 13,  No. 7, pp. 335-347, Jul.  2024
10.3745/TKIPS.2024.13.7.335


PDF
  Abstract

Open-vocabulary 3D point cloud instance segmentation (OV-3DIS) is a challenging visual task to segment a 3D scene point cloud into object instances of both base and novel classes. In this paper, we propose a novel model Open3DME for OV-3DIS to address important design issues and overcome limitations of the existing approaches. First, in order to improve the quality of class-agnostic 3D masks, our model makes use of T3DIS, an advanced Transformer-based 3D point cloud instance segmentation model, as mask proposal module. Second, in order to obtain semantically text-aligned visual features of each point cloud segment, our model extracts both 2D and 3D features from the point cloud and the corresponding multi-view RGB images by using pretrained CLIP and OpenSeg encoders respectively. Last, to effectively make use of both 2D and 3D visual features of each point cloud segment during label assignment, our model adopts a unique feature ensemble method. To validate our model, we conducted both quantitative and qualitative experiments on ScanNet-V2 benchmark dataset, demonstrating significant performance gains.

  Statistics


  Cite this article

[IEEE Style]

S. Song, K. Park, I. Kim, "Class-Agnostic 3D Mask Proposal and 2D-3D Visual Feature Ensemble for Efficient Open-Vocabulary 3D Instance Segmentation," The Transactions of the Korea Information Processing Society, vol. 13, no. 7, pp. 335-347, 2024. DOI: 10.3745/TKIPS.2024.13.7.335.

[ACM Style]

Sungho Song, Kyungmin Park, and Incheol Kim. 2024. Class-Agnostic 3D Mask Proposal and 2D-3D Visual Feature Ensemble for Efficient Open-Vocabulary 3D Instance Segmentation. The Transactions of the Korea Information Processing Society, 13, 7, (2024), 335-347. DOI: 10.3745/TKIPS.2024.13.7.335.