Multi-Object Tracking Using Online Metric Learning with Long Short-Term Memory

The capacity to model temporal dependency by Recurrent Neural Networks (RNNs) makes it a plausible selection for the multi-object tracking (MOT) problem. Due to the nonlinear transformations and the unique memory mechanism, Long Short-Term Memory (LSTM) can consider a window of history when learning discriminative features, which suggests that the LSTM is suitable for state estimation of target objects as they move around. This paper focuses on association based MOT, and we propose a novel Siamese LSTM Network to interpret both temporal and spatial components nonlinearly by learning the feature of trajectories, and outputs the similarity score of two trajectories for data association. In addition, we also introduce an online metric learning scheme to update the state estimation of each trajectory dynamically. Experimental evaluation on MOT16 benchmark shows that the proposed method achieves competitive performance compared with other state-of-the-art works.

[1]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[2]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[3]  Marshall F. Tappen,et al.  Learning pedestrian dynamics from the real world , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Volker Eiselein,et al.  High-Speed tracking-by-detection without using image information , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[5]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[8]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[9]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[11]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[12]  Yaakov Bar-Shalom,et al.  Sonar tracking of multiple targets using joint probabilistic data association , 1983 .

[13]  Silvio Savarese,et al.  Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera , 2010, ECCV.

[14]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Luc Van Gool,et al.  Improving Data Association by Joint Modeling of Pedestrian Trajectories and Groupings , 2010, ECCV.

[17]  Nanning Zheng,et al.  Point to Set Similarity Based Deep Feature Learning for Person Re-Identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Nanning Zheng,et al.  Deep self-paced learning for person re-identification , 2017, Pattern Recognit..

[19]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[20]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[21]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[22]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[23]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[24]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[25]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.