SPDC: A SUPER-POINT AND POINT COMBINING BASED DUAL-SCALE CONTRASTIVE LEARNING NETWORK FOR POINT CLOUD SEMANTIC SEGMENTATION
Keywords: Point cloud semantic segmentation, Dual scale contrastive learning, Super point generation, Dynamic data augmentation
Abstract. Semantic segmentation of point clouds is one of the fundamental tasks of point cloud processing and is the basis for other downstream tasks. Deep learning has become the main method to solve point cloud processing. Most existing 3D deep learning models require large amounts of point cloud data to drive them, but annotating the data requires significant time and economic costs. To address the problem of semantic segmentation requiring large amounts of annotated data for training, this paper proposes a Super-point-level and Point-level Dual-scale Contrast learning network (SPDC). To solve the problem that contrastive learning is difficult to train and feature extraction is not sufficient, we introduce super-point maps to assist the network in feature extraction. We use a pre-trained super-point generation network to convert the original point cloud into a super-point map. A dynamic data augmentation(DDA) module is designed for the super-point maps for super-point-level contrastive learning. We map the extracted super-point-level features back to the original point-level scale and conduct secondary contrastive learning with the original point features. The whole feature extraction network is parameter sharing and to reduce the number of parameters we used the lightweight network DGCNN (encoder)+Self-attention as the backbone network. And we did a few-shot pre-training of the backbone network to make the network converge easily. Analogous to CutMix, we designed a new method for point cloud data augmentation called PointObjectMix (POM). This method solves the sample imbalance problem while preserving the overall characteristics of the objects in the scene. We conducted experiments on the S3DIS dataset and obtained 63.3% mIoU. We have also done a large number of ablation experiments to verify the effectiveness of the modules in our method. Experimental results show that our method outperforms the best-unsupervised network available.