The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XLVIII-4/W17-2025
https://doi.org/10.5194/isprs-archives-XLVIII-4-W17-2025-279-2026
https://doi.org/10.5194/isprs-archives-XLVIII-4-W17-2025-279-2026
15 Jan 2026
 | 15 Jan 2026

3D Building Model Segmentation using GNN and ViT

Hanis Rashidan, Ivin Amri Musliman, Alias Abdul Rahman, and Gurcan Buyuksalih

Keywords: Semantic segmentation, 3D building models, Graph Neural Networks, Vision Transformers

Abstract. Reliable semantics in 3D building models support practical urban tasks such as planning, asset inventory, and maintenance. This paper presents an approach that pairs graph-based geometry (GNN) with image-based appearance (ViT) to improve component segmentation. A Graph Neural Network (GNN) is first applied to the building mesh to capture structural cues and produce initial labels. Multi-view 2D projections (orthographic and perspective) are then rendered and processed with a Vision Transformer (ViT) to recover visual patterns related to windows, doors, roofs, and walls. The two streams are reconciled through a simple consensus fusion that projects ViT predictions back onto the 3D geometry and refines the labels. In experiments, the proposed pipeline improves accuracy and classwise consistency over a GNN baseline, with clearer gains on small or visually ambiguous elements.

Share