A training-free method for estimating the relative height of buildings
Keywords: 3D urban morphology, Building height, Monocular depth estimation, Graph optimization
Abstract. Urbanization’s vertical shift underscores the need for accurate building height estimation to support sustainable planning. Existing methods, limited by low-resolution data and poor generalization, cannot resolve individual buildings. We propose a training-free approach leveraging the foundation model Depth Anything V2 for relative depth estimation from high-resolution remote sensing (RS) imagery. To address GPU memory constraints, RS images are cropped into overlapping patches, and their depth predictions are unified using a height-weighted graph optimization with Levenberg–Marquardt refinement, prioritizing building-related errors. A viewpoint bias filter, modeled as terrain variation, converts relative depth to height by subtracting a morphologically derived DEM. Experiments on Google satellite imagery with 0.5 m resolution over 20 km2 in Wuhan, validated against airborne LiDAR, show an R2 of 0.73 for building heights, significantly outperforming the state-of-the-art MLSBRN whose R2 is only 0.25 and which underestimates tall buildings. Without annotation or training, our scalable method accurately estimates individual building heights, generalizes across complex urban morphologies, and provides a robust solution for 3D urban studies.
