TRINet: Tracking and Re-identification Network for Multiple Targets in Egocentric Videos Using LSTMs

We present a recurrent network based novel framework for tracking and re-identifying multiple targets in first-person perspective. Even though LSTMs can act as a sequence classifier, most of the previous works in multi target tracking use their output with some distance metric for data association. In this work, we employ an LSTM as a classifier and train it over the memory cells output vectors corresponding to different targets obtained from another LSTM. This classifier, based on appearance and motion features, discriminates the targets in two consecutive frames as well as re-identify them in a time interval. We integrate this classifier as an additional block in a detection free tracking architecture which enhances the performance in terms of re-identification of targets and also indicates the absence of targets. We propose a dataset of twenty egocentric videos containing multiple targets to validate our approach.

[1]  Shaogang Gong,et al.  Person Re-Identification by Support Vector Ranking , 2010, BMVC.

[2]  Shaogang Gong,et al.  Reidentification by Relative Distance Comparison , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[5]  Ali Farhadi,et al.  Re$^3$: Re al-Time Recurrent Regression Networks for Visual Tracking of Generic Objects , 2017, IEEE Robotics and Automation Letters.

[6]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[7]  Renu M. Rameshan,et al.  EgoTracker: Pedestrian Tracking with Re-identification in Egocentric Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[10]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[11]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[12]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[13]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[15]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hai Tao,et al.  Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features , 2008, ECCV.

[17]  Long Chen,et al.  Online multi-object tracking with convolutional neural networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Shengcai Liao,et al.  Deep Metric Learning for Person Re-identification , 2014, 2014 22nd International Conference on Pattern Recognition.