The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XLVIII-1/W5-2025
https://doi.org/10.5194/isprs-archives-XLVIII-1-W5-2025-153-2025
https://doi.org/10.5194/isprs-archives-XLVIII-1-W5-2025-153-2025
05 Nov 2025
 | 05 Nov 2025

BEV Space LiDAR-Camera Fusion Methods Based on Attention Driven Feature Fusion Mechanism

Leheng Xu, Minglei Li, Cong Zhou, Jiahui Chai, and Junnan Zhang

Keywords: Multi-scale Attention, Bird’s Eye View, LiDAR-Camera Fusion, Object Detection

Abstract. To address perception challenges for autonomous vehicles and drones in complex urban environments, this paper proposes a novel Bird’s Eye View (BEV) fusion method MSA-BEVFusion to integrate LiDAR and RGB cameras via multi-scale attention mechanisms. Unlike existing methods that tightly couple image and LiDAR features or BEV-based approaches relying on simplistic convolutional fusion, our method first integrates multi-scale image features through the MFPN module and employs multi-scale attention enhancement to achieve deep fusion between camera and LiDAR features before feeding them into the detection head, ultimately delivering superior detection performance. Experiments on the nuScenes dataset demonstrate excellent performance, achieving 0.2% NDS and 0.4% mAP improvements over BEVFusion-MIT. The method shows robust 3D detection in dark, rainy, and snowy conditions, with enhanced accuracy for small or occluded objects. Attention heatmaps reveal effective cross-modal alignment, synergizing LiDAR’s geometric precision with camera texture details. This work bridges modality gaps through bidirectional interaction, advancing robust environmental perception while mitigating spatial discordance in unified BEV representations.

Share