Building Extraction Network based on High-resolution Remote Sensing Image
Keywords: Building extraction, Poly kernel inception network, Multi-scale feature fusion, High-resolution remote sensing images
Abstract. The segmentation of buildings from the background in high-resolution remote sensing images faces several challenges, including difficulties in extracting multi-scale information, insufficient capture of long-range contextual information, and the underutilization of multi-scale features. Existing methods often struggle to effectively capture features at different scales, which limits the segmentation accuracy. Furthermore, long-range contextual information is frequently overlooked, hindering model’s ability in understanding the global structure of buildings. Additionally, balancing low-level details with high-level semantic information poses challenges in effectively fusing multi-scale features from high-resolution imagery. To address these issues, this paper proposes the Multi-Scale Multi-Kernel Building Extraction Network (MMAENet), which significantly enhances the capability to capture multi-scale features through the integration of Poly Kernel Inception Network (PKINet), and improves the capture of long-range contextual information. The Panoramic Feature Pyramid (PFP) structure is introduced to ensure the full integration of both high-level and low-level information. Performance evaluation on the WHU Aerial dataset demonstrates that the model achieves superior accuracy in building segmentation compared to Convnext, PSPNet, and Swin Transformer.