Object localization and change detection in urban environments using dashcam videos
Keywords: 3D mapping, semantic segmentation, object detection, monocular, change detection, MMT
Abstract. The rapid evolution of urban landscapes necessitates efficient mapping solutions. Traditional high-accuracy semantic maps generated using expensive sensors and mobile mapping vehicles provide precise spatial data, but face challenges related to cost and scalability. Crowdsourced dashcam videos present a practical alternative for acquiring urban visual data, leveraging widely available and low-cost camera technology. Recent advances in photogrammetry and computer vision - such as Structure from Motion (SfM), Simultaneous Localization and Mapping (SLAM), semantic segmentation and object detection - enable the extraction of both 3D and semantic information from monocular images. Building upon previous research, we propose a pipeline for constructing and updating semantic 3D maps using crowdsourced low-cost dashcam footages, with a particular emphasis on automatic change detection. Our approach compares metadata related to urban landmarks (e.g., traffic signs) to identify modifications in cityscapes. We evaluate the robustness of the proposed approach with various sequences captured under challenging conditions, including rain, darkness and fog, comparing the performance of SfM-based and SLAM-based 3D reconstruction methods. Results show the effectiveness of the proposed low-cost methodology in localizing urban objects and changes, although accuracy needs to be improved with better georeferencing procedures.
