Querying 3D point clouds exploiting open-vocabulary semantic segmentation of images
Keywords: deep learning, point cloud, query, open-vocabulary
Abstract. While deep models have advanced the 3D data analysis and demonstrated impressive results, they often struggle to generalize to new classes that are absent from the training dataset. Recently, open-vocabulary and zero-shot models have addressed this problem. However, these models are still relying on some data for training and fine-tuning for specific tasks. This requirement limits them to real-world applications. In this research, we propose an open-vocabulary method for point cloud segmentation, which does not require additional training data beyond the images and point cloud from the survey scene. By using the capabilities of the power of 2D open-vocabulary models and geometric features from the 3D data, combined with an XGBoost-guided region growing algorithm, our approach segments the queried objects directly in 3D scenes. We evaluate our method on 3D benchmark datasets, such as Replica and ScanNet, showing its practicality and scalability to real-world scenarios with limited data.