Visual localization in urban environments employing 3D city models
Keywords: Visual Localization, 3D City Models, CityGML, Indirect Pose Estimation, Line Features, Bayesian Optimization
Abstract. Reliable pose information is essential for many applications, such as for navigation or surveying tasks. Though GNSS is a well-established technique to retrieve that information, it often fails in urban environments due to signal occlusion or multi-path effects. In addition, GNSS might be subject to jamming or spoofing, which requires an alternative, complementary positioning method. We introduce a visual localization method which employs building models according to the CityGML standard. In contrast to the most commonly used sources for scene representation in visual localization, such as structure-from-motion (SfM) points clouds, CityGML models are already freely available for many cites worldwide, do not require a large amount of memory and the scene representation database does not have to be generated from images. Yet, 3D models are rarely used because they usually lack properties such as texture or only contain general geometric structures. Our approach utilizes the boundary representation (BREP) of the CityGML models in Level of Detail (LOD) 2 and the geometry of the query image scene from extracted straight line segments. We investigate how we can use an energy function to determine the quality of the correspondence between the line segments of the query image and the projected line segments of the CityGML model based on a specific camera pose. This is then optimized to estimate the camera pose of the query image. We show that a rough estimation of the camera pose is possible purely via the distribution of the line segments and without prior calculation of features and their descriptors. Furthermore, many possibilities and approaches for improvements remain open. However, if these approaches are taken into account, we expect CityGML models to be a promising option for scene representation in visual localization.