Exploring modern end-to-end AI-based multi-view 3D reconstruction
Keywords: Image Orientation, Bundle Adjustment, Foundation Model, Deep Learning, 3D Reconstruction
Abstract. Deriving accurate 3D geometry from multi-view 2D imagery remains a fundamental problem in photogrammetry and computer vision. Conventional pipelines, comprising feature extraction, image matching, bundle adjustment and dense reconstruction, are grounded in well-established geometric principles but remain sensitive to complex scenarios such as significant illumination variability, deficiency in texture and high variability in viewing angles. Recent deep learning developments have triggered a paradigm shift, reformulating multi-view 3D reconstruction as a data-driven, end-to-end optimization problem. Neural architectures now jointly learn feature representations, correspondence estimation and geometric reasoning, supported by large-scale training datasets, high-performance GPU computation, transformer networks and differentiable rendering frameworks. This study methodically examines the transition from traditional photogrammetric approaches to end-to-end AI-based reconstruction pipelines. Using benchmark geomatic datasets, we quantitatively evaluate the performance of two recent and representative end-to-end deep learning methods compared to classical photogrammetry. Results highlight performances of AI-driven approaches in 3D reconstructions and their limits for in large-scale, metric-oriented mapping and modeling applications.
