ISPRS-Archives

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Archives

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9034

Copernicus Publications

Göttingen, Germany

10.5194/isprs-archives-XLIII-B3-2022-559-2022

URBAN CLASSIFICATION BASED ON TOP-VIEW POINT CLOUD AND SAR IMAGE FUSION WITH SWIN TRANSFORMER

Xue

¹ ² Zhang

² Soergel

National Lab of Radar Signal Processing, Xidian University, 710071 Xi’an, China

Institute for Photogrammetry, University of Stuttgart, 70174 Stuttgart, Germany

30 05 2022

XLIII-B3-2022 559 564

2022

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-archives.copernicus.org/articles/XLIII-B3-2022/559/2022/isprs-archives-XLIII-B3-2022-559-2022.html

The full text article is available as a PDF file from https://isprs-archives.copernicus.org/articles/XLIII-B3-2022/559/2022/isprs-archives-XLIII-B3-2022-559-2022.pdf

Urban areas are complex scenarios consisting of objects with various materials. This variety poses a challenge to single-data classification schemes. In this paper, we propose a feature fusion and classification network on RGB top-view point cloud and SAR images with swin-Transformer. In this network, the heterogeneous features are learned separately by an asymmetric encoder, and then they are concatenated along the channel dimension and fed into a fusing encoder. Finally, the fused features are decoded by an UperNet for generating the semantic labels. As data we use high-resolution 3D point cloud provided by Hessigheim benchmark which are complemented by TerraSAR-X images. The overall precision and the mean intersection over union (mIoU) achieves 87.25% and 73.56%, respectively, which outperforms the single-data swin-Transformer by 4.08% and 1.91%, respectively.