Online Multiple Pedestrian Tracking using Deep Temporal Appearance Matching Association

In online multiple pedestrian tracking, it is of great importance to model appearance and geometric similarity between existing tracks and targets appeared in a new frame. The appearance model contains discriminative information with higher dimension compared to the geometric model. Thanks to the recent success of deep learning based methods, handling of high dimensional appearance information becomes possible. Among many deep networks, the Siamese network with triplet loss is popularly adopted as an appearance feature extractor. Since the Siamese network can extract features of each input independently, it is possible to update and maintain target-specific features. However, it is not suitable for multi-object settings that require comparison with other inputs. In this paper we propose a novel track appearance model based on joint-inference network to address this issue. The proposed method enables comparison of two inputs to be used for adaptive appearance modeling. It contributes to disambiguating the process of target-observation matching and consolidating the identity consistency. Diverse experimental results support effectiveness of our method. Our work has been awarded as a 3rd-highest tracker on MOTChallenge19, held in CVPR2019.

[1]  Rainer Stiefelhagen,et al.  The CLEAR 2006 Evaluation , 2006, CLEAR.

[2]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[4]  Jonathon A. Chambers,et al.  Multi-Level Cooperative Fusion of GM-PHD Filters for Online Multiple Human Tracking , 2019, IEEE Transactions on Multimedia.

[5]  Mun-Cheon Kang,et al.  Parallel Feature Pyramid Network for Object Detection , 2018, ECCV.

[6]  R. Mahler Multitarget Bayes filtering via first-order multitarget moments , 2003 .

[7]  Pascal Fua,et al.  Eliminating Exposure Bias and Loss-Evaluation Mismatch in Multiple Object Tracking , 2018, ArXiv.

[8]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[10]  Kwangjin Yoon,et al.  Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[11]  Cewu Lu,et al.  Online Video Object Detection Using Association LSTM , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Matti Pietikäinen,et al.  Multi-Object Tracking Using Color, Texture and Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[14]  Ming-Hsuan Yang,et al.  Structural Constraint Data Association for Online Multi-object Tracking , 2018, International Journal of Computer Vision.

[15]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  David Barber,et al.  Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Daniel Cremers,et al.  CVPR19 Tracking and Detection Challenge: How crowded can it get? , 2019, ArXiv.

[18]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kwangjin Yoon,et al.  Online Multi-Object Tracking With GMPHD Filter and Occlusion Group Management , 2019, IEEE Access.

[20]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Zeyu Fu,et al.  Particle PHD Filter Based Multiple Human Tracking Using Online Group-Structured Dictionary Learning , 2018, IEEE Access.

[22]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Silvio Savarese,et al.  Detecting and tracking people using an RGB-D camera via multiple detector fusion , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  Bing Wang,et al.  Tracklet Association by Online Target-Specific Metric Learning and Coherent Dynamics Estimation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Fan Yang,et al.  Trajectory Factory: Tracklet Cleaving and Re-Connection by Deep Siamese Bi-GRU for Multiple Object Tracking , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Haibin Ling,et al.  FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Bjoern Andres,et al.  Joint Graph Decomposition and Node Labeling by Local Search , 2016, ArXiv.

[31]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[32]  Wen Gao,et al.  Interacting Tracklets for Multi-Object Tracking , 2018, IEEE Transactions on Image Processing.

[33]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Silvio Savarese,et al.  Recurrent Autoregressive Networks for Online Multi-object Tracking , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Volker Eiselein,et al.  High-Speed tracking-by-detection without using image information , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[37]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[39]  Min Yang,et al.  Temporal dynamic appearance modeling for online multi-person tracking , 2016, Comput. Vis. Image Underst..

[40]  James M. Rehg,et al.  Multi-object Tracking with Neural Gating Using Bilinear LSTM , 2018, ECCV.

[41]  Ba-Ngu Vo,et al.  A labeled random finite set online multi-object tracker for video data , 2019, Pattern Recognit..

[42]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[43]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[44]  Tobias Senst,et al.  Extending IOU Based Multi-Object Tracking by Visual Information , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[45]  Ian D. Reid,et al.  Data-Driven Approximations to NP-Hard Problems , 2017, AAAI.

[46]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[47]  Young-min Song,et al.  Online multiple object tracking with the hierarchically adopted GM-PHD filter using motion and appearance , 2016, 2016 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia).

[48]  Seung-Hwan Bae,et al.  Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Ming-Hsuan Yang,et al.  Bayesian Multi-object Tracking Using Motion Context from Multiple Objects , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[50]  Yang Zhang,et al.  Iterative Multiple Hypothesis Tracking With Tracklet-Level Association , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[52]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[53]  Ian D. Reid,et al.  Joint Probabilistic Data Association Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Lu Wang,et al.  Online multiple object tracking via flow and convolutional features , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[55]  Ba-Ngu Vo,et al.  The Gaussian Mixture Probability Hypothesis Density Filter , 2006, IEEE Transactions on Signal Processing.

[56]  Seung-Hwan Bae,et al.  Learning Discriminative Appearance Models for Online Multi-Object Tracking With Appearance Discriminability Measures , 2018, IEEE Access.

[57]  Xiaogang Wang,et al.  Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[58]  Yang Zhang,et al.  Enhancing Detection Model for Multiple Hypothesis Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[59]  Yang Zhang,et al.  Heterogeneous Association Graph Fusion for Target Association in Multiple Object Tracking , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[60]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[61]  Bodo Rosenhahn,et al.  Fusion of Head and Full-Body Detectors for Multi-object Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Fabio Tozeto Ramos,et al.  Alextrac: Affinity learning by exploring temporal reinforcement within association chains , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[63]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[64]  Silvio Savarese,et al.  Ieee Transaction on Pattern Analysis and Machine Intelligence 1 a General Framework for Tracking Multiple People from a Moving Camera , 2022 .

[65]  Thomas Brox,et al.  Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Long Chen,et al.  Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[67]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).