3D Building Model Segmentation using GNN and ViT

Rashidan, Hanis; Musliman, Ivin Amri; Abdul Rahman, Alias; Buyuksalih, Gurcan

doi:10.5194/isprs-archives-XLVIII-4-W17-2025-279-2026

Articles | Volume XLVIII-4/W17-2025

https://doi.org/10.5194/isprs-archives-XLVIII-4-W17-2025-279-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-archives-XLVIII-4-W17-2025-279-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XLVIII-4/W17-2025

15 Jan 2026

| 15 Jan 2026

3D Building Model Segmentation using GNN and ViT

Hanis Rashidan, Ivin Amri Musliman, Alias Abdul Rahman, and Gurcan Buyuksalih

Keywords: Semantic segmentation, 3D building models, Graph Neural Networks, Vision Transformers

Abstract. Reliable semantics in 3D building models support practical urban tasks such as planning, asset inventory, and maintenance. This paper presents an approach that pairs graph-based geometry (GNN) with image-based appearance (ViT) to improve component segmentation. A Graph Neural Network (GNN) is first applied to the building mesh to capture structural cues and produce initial labels. Multi-view 2D projections (orthographic and perspective) are then rendered and processed with a Vision Transformer (ViT) to recover visual patterns related to windows, doors, roofs, and walls. The two streams are reconciled through a simple consensus fusion that projects ViT predictions back onto the 3D geometry and refines the labels. In experiments, the proposed pipeline improves accuracy and classwise consistency over a GNN baseline, with clearer gains on small or visually ambiguous elements.

3D Building Model Segmentation using GNN and ViT

Useful Links

Useful External Links

Our Contact