The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XLVIII-4/W14-2025
https://doi.org/10.5194/isprs-archives-XLVIII-4-W14-2025-275-2025
https://doi.org/10.5194/isprs-archives-XLVIII-4-W14-2025-275-2025
26 Nov 2025
 | 26 Nov 2025

A Lightweight Indoor Localization Method Integrating Keyframe Recognition and Inertial Navigation

Chenzhe Wang, Kai Bi, Shu Peng, Jie Zhu, Zhaolong Li, Han Liu, Shuaiyi Shi, Yujia Chen, and Shiliang Tao

Keywords: Indoor Positioning System, Inertial Navigation, Key Frame Classification, Feature Point Recognition, XFeat, MobileNet V3-Small

Abstract. Most traditional indoor localization schemes necessitate the pre-installation of hardware devices, leading to uncontrollable costs and the requirement for ongoing maintenance. While pure vision-based localization solutions offer the advantages of low cost and deployment-free implementation, they still encounter two major technical bottlenecks. Firstly, vision-based systems relying solely on point cloud data impose substantial computational burdens, which creates difficulties in meeting the real-time performance requirements of mobile terminals. This challenge stems partly from the intensive operations required for point cloud processing— including feature extraction and spatial alignment—whose complexity often exceeds the hardware capabilities of portable devices designed for energy efficiency. Secondly, image matching schemes based on key frames are prone to position jumps, particularly in dynamic scenes or areas with insufficient features. To address the aforementioned constraints, this paper proposes a lightweight indoor positioning framework that establishes tight coupling between visual data and inertial measurements. This framework is structured into three sequential phases: data preprocessing, real-time visual localization computation, and fused positioning result output. During the data preprocessing phase, image data covering the entire indoor scene is acquired, and representative key frames are selected to train a key frame recognizer—with the aim of reducing redundant information and improving subsequent matching efficiency. Concurrently, feature point descriptors of these selected key frames are extracted and organized to construct a structured environmental feature information database. In the real-time visual localization computation phase, based on externally input real-time video streams, relatively precise position estimation is achieved through key frame matching and feature point correspondence, by leveraging the pre-established environmental feature database to accelerate the matching process. Finally, in the fused localization result output stage, based on visual localization, the system integrates data from an Inertial Measurement Unit (IMU) to construct a position estimation framework using the Extended Kalman Filter, outputting smooth and continuous precise positions. Compared with conventional vision-based solutions, this system optimizes the motion trajectory through the recursive propagation of inertial data under the constraints of visual features, thereby significantly enhancing the spatiotemporal continuity of the localization results while maintaining the accuracy of visual localization.

Share