INTEGRATING MOTION PRIORS FOR END-TO-END ATTENTION-BASED MULTI-OBJECT TRACKING
Keywords: Pedestrian Tracking, Image Sequence Analysis, Attention, Transformer, Motion Modelling
Abstract. Recent advancements in multi-object tracking (MOT) have heavily relied on object detection models, with attention-based models like DEtection TRansformer (DETR) demonstrating state-of-the-art capabilities. However, the utilization of attention-based detection models in tracking poses a limitation due to their large parameter count, necessitating substantial training data and powerful hardware for parameter estimation. Ignoring this limitation can lead to a loss of valuable temporal information, resulting in decreased tracking performance and increased identity (ID) switches. To address this challenge, we propose a novel framework that directly incorporates motion priors into the tracking attention layer, enabling an end-to-end solution. Our contributions include: I) a novel approach for integrating motion priors into attention-based multi-object tracking models, and II) a specific realisation of this approach using a Kalman filter with a constant velocity assumption as motion prior. Our method was evaluated on the Multi-Object Tracking dataset MOT17, initial results are reported in the paper. Compared to a baseline model without motion prior, we achieve a reduction in the number of ID switches with the new method.