Minimizing Human Effort in Interactive Tracking by Incremental Learning of Model Parameters

We address the problem of minimizing human effort in interactive tracking by learning sequence-specific model parameters. Determining the optimal model parameters for each sequence is a critical problem in tracking. We demonstrate that by using the optimal model parameters for each sequence we can achieve high precision tracking results with significantly less effort. We leverage the sequential nature of interactive tracking to formulate an efficient method for learning model parameters through a maximum margin framework. By using our method we are able to save ~60 -- 90% of human effort to achieve high precision on two datasets: the VIRAT dataset and an Infant-Mother Interaction dataset.

[1]  Jinxiang Chai,et al.  Interactive Tracking of 2D Generic Objects with Spacetime Optimization , 2008, ECCV.

[2]  P. Trairatvorakul,et al.  Patterns of Attachment: A Psychological Study of the Strange Situation , 2016 .

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[5]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[7]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Arridhana Ciptadi,et al.  Component Analysis Approach to Estimation of Tissue Intensity Distributions of 3D Images , 2009, IEEE Transactions on Medical Imaging.

[9]  James M. Rehg,et al.  Movement Pattern Histogram for Action Recognition and Retrieval , 2014, ECCV.

[10]  Andrew W. Fitzgibbon,et al.  Interactive Feature Tracking using K-D Trees and Dynamic Programming , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[14]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[16]  Deva Ramanan,et al.  Video Annotation and Tracking with Active Learning , 2011, NIPS.

[17]  Harry Shum,et al.  Interactive Offline Tracking for Color Objects , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[19]  C. Lord,et al.  Autism Diagnostic Observation Schedule , 2016 .

[20]  James M. Rehg,et al.  Decoding Children's Social Behavior , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[22]  Liang-Tien Chia,et al.  Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  James M. Rehg,et al.  Combining Self Training and Active Learning for Video Segmentation , 2011, BMVC.

[24]  Kristen Grauman,et al.  Active Frame Selection for Label Propagation in Videos , 2012, ECCV.