Motion-Augmented Inference and Joint Kernels in Structured Learning for Object Tracking

Video object tracking is a fundamental task of continuously following an object of interest in a video sequence. It has attracted considerable attention in both academia and industry due to its diverse applications, such as in automated video surveillance, augmented and virtual reality, medical, automated vehicle navigation and tracking, and smart devices. Challenges in video object tracking arise from occlusion, deformation, background clutter, illumination variation, fast object motion, scale variation, low resolution, rotation, out-of-view, and motion blur. Object tracking remains, therefore, as an active research field. This thesis explores improving object tracking by employing 1) advanced techniques in machine learning theory to account for intrinsic changes in the object appearance under those challenging conditions, and 2) object segmentation. More specifically, we propose a fast and competitive method for object tracking by modeling target dynamics as a random stochastic process, and using structured support vector machines. First, we predict target dynamics by harmonic means and particle filter in which we exploit kernel machines to derive a new entropy based observation likelihood distribution. Second, we employ online structured support vector machines to model object appearance, where we analyze responses of several kernel functions for various feature descriptors and study how such kernels can be optimally combined to formulate a single joint kernel function. During learning, we develop a probability formulation to determine model updates and use sequential minimal optimization-step to solve the structured optimization problem. We gain efficiency improvements in the proposed object tracking by 1) exploiting particle filter for sampling the search space instead of commonly adopted dense sampling strategies, and 2) introducing a motion-augmented regularization term during inference to constrain the output search space. We then extend our baseline tracker to detect tracking failures or inaccuracies and reinitialize itself when needed. To that end, we integrate object segmentation into tracking. First, we use binary support vector machines to develop a technique to detect tracking failures (or inaccuracies) by monitoring internal variables of our baseline tracker. We leverage learned examples from our baseline tracker to train the employed binary support vector machines. Second, we propose an automated method to re-initialize the tracker to recover from tracking failures by integrating an active contour based object segmentation and using particle filter to sample bounding boxes for segmentation. Through extensive experiments on standard video datasets, we subjectively and objectively demonstrate that both our baseline and extended methods strongly compete against state-of-the-art object tracking methods on challenging video conditions.

[1]  Isabelle Bloch,et al.  Visual tracking by fusing multiple cues with context-sensitive reliabilities , 2012, Pattern Recognit..

[2]  A. Hampapur,et al.  Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking , 2005, IEEE Signal Processing Magazine.

[3]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Kurt Konolige,et al.  Projected texture stereo , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  Rachid Deriche,et al.  Geodesic Active Contours and Level Sets for the Detection and Tracking of Moving Objects , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Junzhou Huang,et al.  Robust and Fast Collaborative Tracking with Two Stage Sparse Optimization , 2010, ECCV.

[8]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[10]  Xu Linzhou,et al.  An efficient particle filter with variable number of particles for bearings-only tracking , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[11]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[12]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[13]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Feng Wu,et al.  Very Fast Template Matching , 2002, ECCV.

[15]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[16]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[18]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Ming Tang,et al.  Multi-kernel Correlation Filter for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Chunyuan Liao,et al.  Adaptive Objectness for Object Tracking , 2015, IEEE Signal Processing Letters.

[22]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[23]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..

[25]  Aurélie Bugeau,et al.  Tracking with Occlusions via Graph Cuts , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[27]  Suya You,et al.  Real-Time Object Tracking for Augmented Reality Combining Graph Cuts and Optical Flow , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[28]  Xiangyu Wang,et al.  Research trends and opportunities of augmented reality applications in architecture, engineering, and construction , 2013 .

[29]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Huchuan Lu,et al.  Robust object tracking via sparsity-based collaborative model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[32]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[33]  Kaihua Zhang,et al.  Real-time visual tracking via online weighted multiple instance learning , 2013, Pattern Recognit..

[34]  Antonis A. Argyros,et al.  Integrating tracking with fine object segmentation , 2013, Image Vis. Comput..

[35]  Ian D. Reid,et al.  Unconstrained Multiple-People Tracking , 2006, DAGM-Symposium.

[36]  Xiaokang Yang,et al.  Camshift Guided Particle Filter for Visual Tracking , 2007, 2007 IEEE Workshop on Signal Processing Systems.

[37]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[38]  Yogesh Rathi,et al.  Multi-Object Tracking Through Clutter Using Graph Cuts , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[39]  Gérard G. Medioni,et al.  Continuous tracking within and across camera streams , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[40]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jason Weston,et al.  Solving multiclass support vector machines with LaRank , 2007, ICML '07.

[42]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[43]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[44]  James J. Little,et al.  A Linear Programming Approach for Multiple Object Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Shuicheng Yan,et al.  NUS-PRO: A New Visual Tracking Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Antonis A. Argyros,et al.  Object Tracking and Segmentation in a Closed Loop , 2010, ISVC.

[48]  Qing Wang,et al.  Online discriminative object tracking with local sparse representation , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[49]  Huchuan Lu,et al.  Fast and effective color-based object tracking by boosted color distribution , 2013, Pattern Analysis and Applications.

[50]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[51]  Gareth Funka-Lea,et al.  Graph Cuts and Efficient N-D Image Segmentation , 2006, International Journal of Computer Vision.

[52]  J. Beveridge,et al.  Average of Synthetic Exact Filters , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Huchuan Lu,et al.  Least Soft-Threshold Squares Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[56]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[57]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Elsevier Sdol,et al.  Journal of Visual Communication and Image Representation , 2009 .

[60]  Xin Li,et al.  Contour-based object tracking with occlusion handling in video acquired using mobile cameras , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  F. Dellaert,et al.  A Rao-Blackwellized particle filter for EigenTracking , 2004, CVPR 2004.

[62]  Jing Zhang,et al.  Target tracking algorithm based on dynamic template and Kalman filter , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[63]  Robert T. Collins,et al.  Mean-shift blob tracking through scale space , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[64]  Aishy Amer,et al.  Embedded architecture for noise-adaptive video object detection using parameter-compressed background modeling , 2014, Journal of Real-Time Image Processing.

[65]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[66]  Harry Shum,et al.  Lazy snapping , 2004, ACM Trans. Graph..

[67]  Mubarak Shah,et al.  A noniterative greedy algorithm for multiframe point correspondence , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Wei Wei,et al.  Visual Tracking Based on the Adaptive Color Attention Tuned Sparse Generative Object Model , 2015, IEEE Transactions on Image Processing.

[69]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[70]  Ilse Ravyse,et al.  Robust Shape-Based Head Tracking , 2007, ACIVS.

[71]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[72]  Huchuan Lu,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Online Object Tracking with Sparse Prototypes , 2022 .

[73]  Mubarak Shah,et al.  Establishing motion correspondence , 1991, CVGIP Image Underst..

[74]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[75]  Sang Uk Lee,et al.  Generative Image Segmentation Using Random Walks with Restart , 2008, ECCV.

[76]  Tony F. Chan,et al.  Active contours without edges , 2001, IEEE Trans. Image Process..

[77]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[78]  Jin Gao,et al.  Semi-Supervised Tensor-Based Graph Embedding Learning and Its Application to Visual Discriminant Tracking , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Haibin Ling,et al.  Real time robust L1 tracker using accelerated proximal gradient approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Junzhou Huang,et al.  Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.

[84]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[85]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[86]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Huchuan Lu,et al.  Visual Tracking via Discriminative Sparse Similarity Map , 2014, IEEE Transactions on Image Processing.

[88]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[89]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[90]  David Zhang,et al.  Joint Registration and Active Contour Segmentation for Object Tracking , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[91]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Dmitriy Fradkin,et al.  Experiments with random projections for machine learning , 2003, KDD '03.

[93]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Dong Liang,et al.  Robust multi-feature visual tracking via multi-task kernel-based sparse learning , 2017, IET Image Process..

[95]  Stanley T. Birchfield,et al.  Adaptive fragments-based tracking of non-rigid objects using level sets , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[96]  Heinrich Niemann,et al.  Segmentation-based object tracking using image warping and Kalman filtering , 2002, Proceedings. International Conference on Image Processing.

[97]  Lars Bretzner,et al.  Feature Tracking with Automatic Selection of Spatial Scales , 1998, Comput. Vis. Image Underst..

[98]  Youfu Li,et al.  Learning Local Appearances With Sparse Representation for Robust and Fast Visual Tracking , 2015, IEEE Transactions on Cybernetics.

[99]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[100]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[101]  Weiming Hu,et al.  Level set tracking with dynamical shape priors , 2008, 2008 15th IEEE International Conference on Image Processing.

[102]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[103]  Jin Zhang,et al.  Multi-cue-based CamShift guided particle filter tracking , 2011, Expert Syst. Appl..

[104]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[105]  S. Javadi,et al.  People Tracking in Outdoor Environment Using Kalman Filter , 2012, 2012 Third International Conference on Intelligent Systems Modelling and Simulation.

[106]  Xiaofeng Wang,et al.  An efficient local Chan-Vese model for image segmentation , 2010, Pattern Recognit..

[107]  Cor J. Veenman,et al.  Resolving Motion Correspondence for Densely Moving Points , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[108]  Yingfeng Cai,et al.  Multilevel framework to handle object occlusions for real-time tracking , 2016, IET Image Process..

[109]  Shiuh-Ku Weng,et al.  Video object tracking using adaptive Kalman filter , 2006, J. Vis. Commun. Image Represent..

[110]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[111]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[112]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[113]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[114]  Maria A. Amer,et al.  Object tracking with adaptive motion modeling of particle filter and support vector machines , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[115]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[116]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[117]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[118]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[119]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[120]  Junseok Kwon,et al.  Highly Nonrigid Object Tracking via Patch-Based Dynamic Appearance Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[121]  James J. Little,et al.  Tracking and recognizing actions of multiple hockey players using the boosted particle filter , 2009, Image Vis. Comput..

[122]  Tieniu Tan,et al.  Real time hand tracking by combining particle filtering and mean shift , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[123]  Qing Wang,et al.  Transferring Visual Prior for Online Object Tracking , 2012, IEEE Transactions on Image Processing.

[124]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[125]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[126]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[127]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[128]  Li Bai,et al.  Minimum error bounded efficient ℓ1 tracker with occlusion detection , 2011, CVPR 2011.

[129]  Advanced Concepts for Intelligent Vision Systems , 2010, Lecture Notes in Computer Science.

[130]  Ben J. A. Kröse,et al.  An EM-like algorithm for color-histogram-based object tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[131]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[132]  Ling Shao,et al.  Recent advances and trends in visual tracking: A review , 2011, Neurocomputing.

[133]  Philip Birch,et al.  An adaptive sample count particle filter , 2012, Comput. Vis. Image Underst..

[134]  Yong Rui,et al.  Better proposal distributions: object tracking using unscented particle filter , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[135]  Ishwar K. Sethi,et al.  Finding Trajectories of Feature Points in a Monocular Image Sequence , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.