Coupling V-SLAM and Semantic Segmentation for Cultural Heritage Documentation

El-Alailyi, Ahmad; Zhang, Kai; Mea, Chiara; Perfetti, Luca; Remondino, Fabio; Fassi, Francesco

doi:https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-435-2025

Articles | Volume XLVIII-M-9-2025

https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-435-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-archives-XLVIII-M-9-2025-435-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XLVIII-M-9-2025

01 Oct 2025

| 01 Oct 2025

Coupling V-SLAM and Semantic Segmentation for Cultural Heritage Documentation

Ahmad El-Alailyi, Kai Zhang, Chiara Mea, Luca Perfetti, Fabio Remondino, and Francesco Fassi

Keywords: Multi-camera Mobile Mapping System, V-SLAM and 3D Reconstruction, Deep Learning, Heritage Documentation, Architectural and Pathologies Semantics, 2D-to-3D Enrichment

Abstract. 3D digitization has become an essential tool in cultural heritage documentation, offering unprecedented opportunities for preservation, analysis, and dissemination. Beyond only capturing 3D spatial geometry, the semantic enrichment of 3D models is rapidly evolving offering a more efficient interpretation and usage of 3D data. Traditionally, 3D semantic enrichment has relied on point cloud-based segmentation. However, 3D point cloud-based segmentation approaches can struggle with the efficient identification of small-scale, geometric elements, or visually ambiguous classes, limiting their applicability in such contexts. This study leverages the rich contextual and textural information of 2D imagery to detect challenging semantic categories, such as fine architectural elements (e.g., individual stone blocks) and material decay (e.g., material detachment and material loss), using deep learning-based 2D semantic segmentation techniques. These detections are then projected into 3D space through a 2D-to-3D semantic segmentation framework that couples V-SLAM and 3D results with the 2D predictions. The framework is evaluated on data acquired using the fish-eye multi-camera mobile mapping system ATOM-ANT3D in two challenging case study environments. Achieved results demonstrate a reliable level of accu-racy given the inherent complexity of targeted classes, enhancing the interpretability of 3D models by providing meaningful and met-rically interpreted objects classifications in 3D models. (Demonstration video: https://youtu.be/GidxhNS7ECc)

Coupling V-SLAM and Semantic Segmentation for Cultural Heritage Documentation

Useful Links

Useful External Links

Our Contact