3DGeoRef: an automated framework for georeferencing heritage 3D models
Keywords: Data Space for Cultural Heritage, 3D heritage models, georeferencing, VLM, multimodal models
Abstract. Geolocalization is the process of determining the precise geographical coordinates and orientation of a device or person or object. Georeferencing is the same process but referred to images, maps or 3D models. Pinpointing where in the world an image was acquired or a 3D model is located, down to a decimetre or meter error, remains a challenge in photogrammetry and computer vision, especially when no priors are available or not touristic locations are considered. Methods have evolved from simple GNSS tagging to complex computer vision and AI-driven spatial reasoning, including the recent LLM/VLM/MLLM approaches. This work presents an automated pipeline to georeference heritage 3D models lacking geolocation metadata. The developed method combines synthetic views generation of a not-georeferenced 3D model, VLM and multimodal-based location estimation, satellite imagery retrieval and learning-based image matching techniques to determine the transformation to align the 3D model with real-world coordinates. Results are below the meter accuracy if a substantial amount of surrounding data is available to support the inference of the initial rough location. The final aim of the presented pipeline is to supplement the Cultural Heritage European Data Space with enriched 3D models.
