Towards annotation-less semantic segmentation of aerial point clouds
Keywords: point cloud, semantic segmentation, language models, deep learning, photogrammetry
Abstract. The ability to automatically recognize a wide variety of objects in complex 3D urban environments without relying on predefined categories or annotated training data is becoming increasingly important for end-users of large-scale geospatial 3D datasets. Given that objects in urban scenes noticeably vary across locations, users and applications, flexible annotation-free methods for 3D semantic segmentation are getting desirable. In this work, we present and compare two approaches for classifying aerial photogrammetric point clouds. The first employs conventional supervised 3D neural networks trained on annotated datasets and predefined object classes. The second adopts a training-free, open-vocabulary strategy that detects objects directly in images and subsequently projects and refines them within 3D space. Approaches are evaluated through quantitative metrics and qualitative analysis, providing insights into their respective capabilities and limitations over 3D urban areas.
