Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning

We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational bud- get. Crucially, the agent must decide where to look in the upcoming frames, when to reinitialize because it believes the target has been lost, and when to update its appearance model for the tracked object. Such decisions are typically made heuristically. Instead, we propose to learn an optimal decision-making policy by formulating tracking as a par- tially observable decision-making process (POMDP). We learn policies with deep reinforcement learning algorithms that need supervision (a reward signal) only when the track has gone awry. We demonstrate that sparse rewards al- low us to quickly train on massive datasets, several orders of magnitude more than past work. Interestingly, by treat- ing the data source of Internet videos as unlimited streams, we both learn and evaluate our trackers in a single, unified computational stream.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Ivar Jacobson,et al.  The unified modeling language reference manual , 2010 .

[5]  David Salesin,et al.  Keyframe-based tracking for rotoscoping and animation , 2004, ACM Trans. Graph..

[6]  Cynthia Breazeal,et al.  Real-Time Interactive Reinforcement Learning for Robots , 2005 .

[7]  Lucas Paletta,et al.  Q-learning of sequential attention for visual object recognition from informative local descriptors , 2005, ICML.

[8]  Andrew W. Fitzgibbon,et al.  Interactive Feature Tracking using K-D Trees and Dynamic Programming , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Reinaldo A. C. Bianchi,et al.  Accelerating autonomous learning by using heuristic selection of actions , 2008, J. Heuristics.

[10]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[13]  Deva Ramanan,et al.  Video Annotation and Tracking with Active Learning , 2011, NIPS.

[14]  W. B. Knox Augmenting Reinforcement Learning with Human Feedback , 2011 .

[15]  Jae-Yeong Lee,et al.  Visual tracking by partition-based histogram backprojection and maximum support criteria , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Reinaldo A. C. Bianchi,et al.  Heuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results , 2012, ECAI.

[18]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  T. Sejnowski,et al.  Learning where to look for a hidden target , 2013, Proceedings of the National Academy of Sciences.

[22]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Jiri Matas,et al.  Robust scale-adaptive mean-shift for tracking , 2013, Pattern Recognit. Lett..

[24]  Haibin Ling,et al.  Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  E. Morales,et al.  Human Interaction for Effective Reinforcement Learning , 2013 .

[27]  Sungjin Hong,et al.  Adaptive visual tracking using the prioritized Q-learning algorithm: MDP-based parameter learning approach , 2014, Image Vis. Comput..

[28]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking , 2014, BMVC.

[29]  Alfredo Petrosino,et al.  The Matrioska Tracking Algorithm on LTDT2014 Dataset , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[30]  Jiri Matas,et al.  Robust scale-adaptive mean-shift for tracking , 2013, Pattern Recognition Letters.

[31]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[32]  Alberto Del Bimbo,et al.  Object Tracking by Oversampling Local Features , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Robert Laganière,et al.  Scalable Kernel Correlation Filter with Sparse Feature Integration , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[36]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[38]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Dit-Yan Yeung,et al.  Understanding and Diagnosing Visual Tracking Systems , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[43]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46]  Cordelia Schmid,et al.  Online Object Tracking with Proposal Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[48]  Luc Van Gool,et al.  Learning to Predict Sequences of Human Visual Fixations , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[50]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[52]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Cristian Sminchisescu,et al.  Reinforcement Learning for Visual Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[55]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Shuicheng Yan,et al.  NUS-PRO: A New Visual Tracking Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[58]  Qingming Huang,et al.  Online Deformable Object Tracking Based on Structure-Aware Hyper-Graph , 2016, IEEE Transactions on Image Processing.

[59]  Hongdong Li,et al.  Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Huchuan Lu,et al.  Dual Deep Network for Visual Tracking , 2016, IEEE Transactions on Image Processing.

[61]  Christopher Joseph Pal,et al.  RATM: Recurrent Attentive Tracking Model , 2015, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).