The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XLVIII-G-2025
https://doi.org/10.5194/isprs-archives-XLVIII-G-2025-1785-2025
https://doi.org/10.5194/isprs-archives-XLVIII-G-2025-1785-2025
02 Aug 2025
 | 02 Aug 2025

Remote sensing semantic segmentation based on multimodal feature alignment and fusion

Boshen Chang and Timo Balz

Keywords: Land Use, Haar Transform, Feature Aligning

Abstract. The accurate semantic segmentation of remote sensing data is of paramount importance to the success of geoscience research and applications. In comparison to traditional single-modal segmentation techniques, models based on multi-modal fusion have demonstrated superior performance and have been the subject of considerable attention in recent years. However, the majority of these models employ convolutional neural networks (CNNs) or visual transformers (ViTs) for fusion operations, which results in inadequate modelling and representation of local-global context. In this study, we propose a multi-layer multi-modal feature alignment and fusion scheme, designated as MFAFUNet, with the objective of providing a robust and effective multi-modal fusion backbone for semantic segmentation. The overarching algorithmic framework is analogous to that of the Unet model. First, the data in different modalities is aggregated and the image size is reduced through the use of multi-level downsampling modules based on the Haar wavelet transform. The high-frequency and low-frequency information of the features is extracted through a feature extraction module composed of a convolutional neural network (CNN) and a visual transformer (ViT). Second, through the semantic distribution alignment loss, the high-level features of different modal information are transformed into a common latent space, and their distributions are aligned to associate the complementary clues hidden in each modality. The effectiveness of the proposed method is demonstrated through experiments.

Share