BEV Space LiDAR-Camera Fusion Methods Based on Attention Driven Feature Fusion Mechanism

Xu, Leheng; Li, Minglei; Zhou, Cong; Chai, Jiahui; Zhang, Junnan

doi:https://doi.org/10.5194/isprs-archives-XLVIII-1-W5-2025-153-2025

Articles | Volume XLVIII-1/W5-2025

https://doi.org/10.5194/isprs-archives-XLVIII-1-W5-2025-153-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-archives-XLVIII-1-W5-2025-153-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XLVIII-1/W5-2025

05 Nov 2025

| 05 Nov 2025

BEV Space LiDAR-Camera Fusion Methods Based on Attention Driven Feature Fusion Mechanism

Leheng Xu, Minglei Li, Cong Zhou, Jiahui Chai, and Junnan Zhang

Keywords: Multi-scale Attention, Bird’s Eye View, LiDAR-Camera Fusion, Object Detection

Abstract. To address perception challenges for autonomous vehicles and drones in complex urban environments, this paper proposes a novel Bird’s Eye View (BEV) fusion method MSA-BEVFusion to integrate LiDAR and RGB cameras via multi-scale attention mechanisms. Unlike existing methods that tightly couple image and LiDAR features or BEV-based approaches relying on simplistic convolutional fusion, our method first integrates multi-scale image features through the MFPN module and employs multi-scale attention enhancement to achieve deep fusion between camera and LiDAR features before feeding them into the detection head, ultimately delivering superior detection performance. Experiments on the nuScenes dataset demonstrate excellent performance, achieving 0.2% NDS and 0.4% mAP improvements over BEVFusion-MIT. The method shows robust 3D detection in dark, rainy, and snowy conditions, with enhanced accuracy for small or occluded objects. Attention heatmaps reveal effective cross-modal alignment, synergizing LiDAR’s geometric precision with camera texture details. This work bridges modality gaps through bidirectional interaction, advancing robust environmental perception while mitigating spatial discordance in unified BEV representations.

BEV Space LiDAR-Camera Fusion Methods Based on Attention Driven Feature Fusion Mechanism

Useful Links

Useful External Links

Our Contact