BEV Space LiDAR-Camera Fusion Methods Based on Attention Driven Feature Fusion Mechanism
Keywords: Multi-scale Attention, Bird’s Eye View, LiDAR-Camera Fusion, Object Detection
Abstract. To address perception challenges for autonomous vehicles and drones in complex urban environments, this paper proposes a novel Bird’s Eye View (BEV) fusion method MSA-BEVFusion to integrate LiDAR and RGB cameras via multi-scale attention mechanisms. Unlike existing methods that tightly couple image and LiDAR features or BEV-based approaches relying on simplistic convolutional fusion, our method first integrates multi-scale image features through the MFPN module and employs multi-scale attention enhancement to achieve deep fusion between camera and LiDAR features before feeding them into the detection head, ultimately delivering superior detection performance. Experiments on the nuScenes dataset demonstrate excellent performance, achieving 0.2% NDS and 0.4% mAP improvements over BEVFusion-MIT. The method shows robust 3D detection in dark, rainy, and snowy conditions, with enhanced accuracy for small or occluded objects. Attention heatmaps reveal effective cross-modal alignment, synergizing LiDAR’s geometric precision with camera texture details. This work bridges modality gaps through bidirectional interaction, advancing robust environmental perception while mitigating spatial discordance in unified BEV representations.
