ISPRS-Archives

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Archives

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9034

Copernicus Publications

Göttingen, Germany

10.5194/isprs-archives-XLVIII-1-W5-2025-185-2025

Semantic-Consistent 3D Reconstruction via Gaussian Splatting and SAM-Guided Annotation

Zhang

Zhaoning

¹ Wang

Tengfei

¹ Xu

Zipeng

¹ Ji

Quanjian

¹ Wang

Xin

¹ Zhan

Zongqian

School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China

05 11 2025

XLVIII-1/W5-2025 185 192

2025

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-archives.copernicus.org/articles/XLVIII-1-W5-2025/185/2025/isprs-archives-XLVIII-1-W5-2025-185-2025.html

The full text article is available as a PDF file from https://isprs-archives.copernicus.org/articles/XLVIII-1-W5-2025/185/2025/isprs-archives-XLVIII-1-W5-2025-185-2025.pdf

3D Gaussian Splatting (3DGS) provides a novel paradigm for multi-view semantic scene reconstruction, offering explicit and fine-grained geometric representations. However, accurate semantic expression in reconstructed scenes remains problematic as existing methods relying on multi-view 2D semantic projections inherently suffer from cross-view ambiguities in semantic boundaries and label inconsistencies. These limitations arise from occlusions, illumination variations, and object deformations, which degrade the semantic fidelity of 3D reconstructions by introducing conflicting label assignments across viewpoints. This paper proposes a geometry-verified multi-view semantic scene reconstruction framework. First, a depth-aware projection aligns 2D semantic masks with 3DGS-reconstructed point clouds, filtering semantic ambiguities in multi-view annotations via a conflict-locking mechanism. Second, a geometry-aware semantic propagation model globally diffuses semantic labels by leveraging the local geometric continuity of the point cloud. Experiments demonstrate that the proposed framework achieves significantly superior reconstruction consistency in occlusion-heavy scenes compared to conventional methods. Specifically, it outperforms multi-view voting strategies with improvements of 12.39% in Overall Accuracy (OA) and 17.03% in mean Intersection over Union (mIoU) on the BeDOI-GB dataset. Project web: <code>https://bigbigman233.github.io/SC3D.github.io/</code>