ISPRS-Archives

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Archives

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9034

Copernicus Publications

Göttingen, Germany

10.5194/isprs-archives-XLIII-B3-2022-55-2022

BUILDING EXTRACTION FROM HIGH-RESOLUTION REMOTE SENSING IMAGERY BASED ON MULTI-SCALE FEATURE FUSION AND ENHANCEMENT

Chen

¹ Cheng

¹ Yao

¹ Hu

Nanjing University of Posts and Telecommunications, 210003 Nanjing, China

MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area, Shenzhen 518060, China

30 05 2022

XLIII-B3-2022 55 60

2022

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-archives.copernicus.org/articles/XLIII-B3-2022/55/2022/isprs-archives-XLIII-B3-2022-55-2022.html

The full text article is available as a PDF file from https://isprs-archives.copernicus.org/articles/XLIII-B3-2022/55/2022/isprs-archives-XLIII-B3-2022-55-2022.pdf

The accurate detection and mapping of buildings from high-resolution remote sensing (HRRS) images have attracted extensive attention. However, as an artificial target, buildings not only have various types, but also have multi-scale characteristics and complex context, which brings great challenges to the accurate identification of buildings. To deal with this problem, a semantic segmentation model based on multi-scale feature fusion and enhancement (MSFFE) is proposed for building extraction from HRRS images. Specifically, the proposed model uses the network structure of encoder and decoder. In the encoding stage, densely connected convolutional neural network is used as an encoder to extract multi-level spatial and semantic features. To effectively use the multiscale features of buildings, a multi-scale feature fusion (MSFF) module between encoder and decoder is designed to distinguish buildings of different scales in complex scenes. In the decoding stage, an attention weighted semantic enhancement (AWSE) module is introduced into the decoder to assist the up-sampling process. It not only makes full use of the multi-level features output by the encoder, but also highlights the key local semantic information of the building. To verify the effectiveness of the proposed model, experiments were conducted on two building segmentation data sets, WHU and INRIA. The preliminary results show that the proposed model can effectively identify buildings with different scales in complex scenes, and has better performance than the current representative networks including FCN, U-net, DeeplabV3+ and MA-FCN.