High-Precision 3D Recognition of Road Potholes Based on Binocular Vision and Cross-Modal Feature Fusion

Wang, Hanzheng; Xu, Shishuo; Hu, Danyang; Wen, Zheng; Ou, Jianxi

doi:10.5194/isprs-archives-XLVIII-4-W14-2025-291-2025

Articles | Volume XLVIII-4/W14-2025

https://doi.org/10.5194/isprs-archives-XLVIII-4-W14-2025-291-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-archives-XLVIII-4-W14-2025-291-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XLVIII-4/W14-2025

26 Nov 2025

| 26 Nov 2025

High-Precision 3D Recognition of Road Potholes Based on Binocular Vision and Cross-Modal Feature Fusion

Hanzheng Wang, Shishuo Xu, Danyang Hu, Zheng Wen, and Jianxi Ou

Keywords: binocular vision, lightweight detection, Mask R-CNN, 3D reconstruction, road damage ratio

Abstract. Efficient detection and accurate three-dimensional characterization of road potholes are crucial for road maintenance and traffic safety. To address the issues of high cost and poor environmental adaptability in existing detection methods, this study proposes a lightweight pothole detection and 3D reconstruction method based on binocular stereo vision and deep learning. A ZED 2i binocular camera was used to build a vehicle-mounted acquisition system, combined with the Mask R-CNN model to achieve pothole detection and pixel-level segmentation. The 3D point cloud of potholes was reconstructed using the principles of binocular stereo vision, and a dynamic mesh density method was proposed to optimize surface area calculation. Additionally, the RANSAC algorithm was employed to fit the ground plane and extract depth parameters. Experimental results demonstrate that this method can achieve precise measurements of pothole depth and surface area at a speed of 40 km/h, with relative errors of 12.53% and 18.19%, respectively, and an average accuracy of 82% for damage ratio (DR) calculation. Furthermore, an MSRCP image enhancement technique and a sliding window cropping strategy (overlap rate of 0.7) were used to construct a dataset containing 6,416 images, significantly improving the model's robustness in complex scenarios such as shadows and varying lighting conditions. This study provides road maintenance departments with a low-cost, high-precision intelligent pothole detection solution, reducing hardware costs by 90% compared to traditional laser sensors, and demonstrates significant value for engineering applications.

High-Precision 3D Recognition of Road Potholes Based on Binocular Vision and Cross-Modal Feature Fusion

Useful Links

Useful External Links

Our Contact