The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLVIII-M-3-2023
https://doi.org/10.5194/isprs-archives-XLVIII-M-3-2023-273-2023
https://doi.org/10.5194/isprs-archives-XLVIII-M-3-2023-273-2023
05 Sep 2023
 | 05 Sep 2023

DCTNET: HYBRID NETWORK MODEL FUSING WITH MULTISCALE DEFORMABLE CNN AND TRANSFORMER STRUCTURE FOR ROAD EXTRACTION FROM GAOFEN SATELLITE REMOTE SENSING IMAGE

Q. Yuan

Keywords: road extraction, Transformer, convolutional neural network, deep learning

Abstract. The urban road network detection and extraction have significant applications in many domains, such as intelligent transportation and navigation, urban planning, and automatic driving. Although manual annotation methods can provide accurate road network maps, their low efficiency with high-cost consumption are insufficient for the current tasks. Traditional methods based on spectral or geometric information rely on shallow features and often struggle with low semantic segmentation accuracy in complex remote sensing backgrounds. In recent years, deep convolutional neural networks (CNN) have provided robust feature representations to distinguish complex terrain objects. However, these CNNs ignore the fusion of global-local contexts and are often confused with other types of features, especially buildings. In addition, conventional convolution operations use a fixed template paradigm to aggregate local feature information. The road features present complex linear-shape geometric relationships, which brings some obstacles to feature construction. To address the above issues, we proposed a hybrid network structure that combines the advantages of CNN and transformer models. Specifically, a multiscale deformable convolution module has been developed to capture local road context information adaptively. The Transformer model is introduced into the encoder to enhance semantic information to build the global context. Meanwhile, the CNN features are fused with the transformer features. Finally, the model outputs a road extraction prediction map in high spatial resolution. Quantitative analysis and visual expression confirm that the proposed model can effectively and automatically extract road features from complex remote sensing backgrounds, outperforming state-of-the-art methods with IOU by 86.5% and OA by 97.4%.