High-Precision 3D Recognition of Road Potholes Based on Binocular Vision and Cross-Modal Feature Fusion
Keywords: binocular vision, lightweight detection, Mask R-CNN, 3D reconstruction, road damage ratio
Abstract. Efficient detection and accurate three-dimensional characterization of road potholes are crucial for road maintenance and traffic safety. To address the issues of high cost and poor environmental adaptability in existing detection methods, this study proposes a lightweight pothole detection and 3D reconstruction method based on binocular stereo vision and deep learning. A ZED 2i binocular camera was used to build a vehicle-mounted acquisition system, combined with the Mask R-CNN model to achieve pothole detection and pixel-level segmentation. The 3D point cloud of potholes was reconstructed using the principles of binocular stereo vision, and a dynamic mesh density method was proposed to optimize surface area calculation. Additionally, the RANSAC algorithm was employed to fit the ground plane and extract depth parameters. Experimental results demonstrate that this method can achieve precise measurements of pothole depth and surface area at a speed of 40 km/h, with relative errors of 12.53% and 18.19%, respectively, and an average accuracy of 82% for damage ratio (DR) calculation. Furthermore, an MSRCP image enhancement technique and a sliding window cropping strategy (overlap rate of 0.7) were used to construct a dataset containing 6,416 images, significantly improving the model's robustness in complex scenarios such as shadows and varying lighting conditions. This study provides road maintenance departments with a low-cost, high-precision intelligent pothole detection solution, reducing hardware costs by 90% compared to traditional laser sensors, and demonstrates significant value for engineering applications.
