Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information

Multiobject Tracking (MOT) is one of the most important abilities of autonomous driving systems. However, most of the existing MOT methods only use a single sensor, such as a camera, which has the problem of insufficient reliability. In this paper, we propose a novel Multiobject Tracking method by fusing deep appearance features and motion information of objects. In this method, the locations of objects are first determined based on a 2D object detector and a 3D object detector. We use the Nonmaximum Suppression (NMS) algorithm to combine the detection results of the two detectors to ensure the detection accuracy in complex scenes. After that, we use Convolutional Neural Network (CNN) to learn the deep appearance features of objects and employ Kalman Filter to obtain the motion information of objects. Finally, the MOT task is achieved by associating the motion information and deep appearance features. A successful match indicates that the object was tracked successfully. A set of experiments on the KITTI Tracking Benchmark shows that the proposed MOT method can effectively perform the MOT task. The Multiobject Tracking Accuracy (MOTA) is up to 76.40% and the Multiobject Tracking Precision (MOTP) is up to 83.50%.

[1]  Hong Liu,et al.  Scene-Adaptive Hierarchical Data Association for Multiple Objects Tracking , 2014, IEEE Signal Processing Letters.

[2]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[3]  Long Chen,et al.  Aggregate Tracklet Appearance Features for Multi-Object Tracking , 2019, IEEE Signal Processing Letters.

[4]  Thomas Brox,et al.  Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shiping Wen,et al.  k-Reciprocal Harmonious Attention Network for Video-Based Person Re-Identification , 2019, IEEE Access.

[6]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xiaogang Wang,et al.  Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Andrea Cavallaro,et al.  Multi-target tracking on confidence maps: An application to people tracking , 2013, Comput. Vis. Image Underst..

[9]  Tao Mei,et al.  PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance , 2018, IEEE Transactions on Multimedia.