Learning to Track at 100 FPS with Deep Regression Networks

Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker's state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker is the first neural-network tracker that learns to track generic objects at 100 fps.

[1]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Serge J. Belongie,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yihong Gong,et al.  Human Tracking Using Convolutional Neural Networks , 2010, IEEE Transactions on Neural Networks.

[5]  Nando de Freitas,et al.  Learning attentional policies for tracking and recognition in video with deep networks , 2011, ICML.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Andreas Geiger,et al.  Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms , 2013 .

[9]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Eugenio Culurciello,et al.  Tracking with deep neural networks , 2013, 2013 47th Annual Conference on Information Sciences and Systems (CISS).

[11]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[12]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking , 2014, BMVC.

[13]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ales Leonardis,et al.  Is my new tracker really better than yours? , 2014, IEEE Winter Conference on Applications of Computer Vision.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[19]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Lisa Anne Hendricks,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2015, CVPR.

[23]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[24]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qingshan Liu,et al.  Robust Visual Tracking via Convolutional Networks , 2015 .

[26]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kian-Ming Lim,et al.  Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle , 2015, Pattern Recognit..

[29]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Dit-Yan Yeung,et al.  Understanding and Diagnosing Visual Tracking Systems , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[33]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Tal Hassner,et al.  Age and gender classification using convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking , 2015, IEEE Transactions on Image Processing.

[38]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  F. Porikli,et al.  DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[41]  Qingshan Liu,et al.  Robust Visual Tracking via Convolutional Networks Without Training , 2016, IEEE Transactions on Image Processing.

[42]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[43]  Zhe,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).