Online Object Tracking, Learning and Parsing with And-Or Graphs

This paper presents a method, called <italic>AOGTracker</italic>, for simultaneously tracking, learning and parsing (TLP) of unknown objects in video sequences with a hierarchical and compositional And-Or graph (AOG) representation. The TLP method is formulated in the Bayesian framework with a spatial and a temporal dynamic programming (DP) algorithms inferring object bounding boxes on-the-fly. During online learning, the AOG is discriminatively learned using latent SVM <xref ref-type="bibr" rid="ref1">[1]</xref> to account for appearance (e.g., lighting and partial occlusion) and structural (e.g., different poses and viewpoints) variations of a tracked object, as well as distractors (e.g., similar objects) in background. Three key issues in online inference and learning are addressed: (i) maintaining purity of positive and negative examples collected online, (ii) controling model complexity in latent structure learning, and (iii) identifying critical moments to re-learn the structure of AOG based on its intrackability. The intrackability measures uncertainty of an AOG based on its score maps in a frame. In experiments, our AOGTracker is tested on two popular tracking benchmarks with the same parameter setting: the TB-100/50/CVPR2013 benchmarks <xref ref-type="bibr" rid="ref2">[2]</xref> , <xref ref-type="bibr" rid="ref3">[3]</xref> , and the VOT benchmarks <xref ref-type="bibr" rid="ref4">[4]</xref> —VOT 2013, 2014, 2015 and TIR2015 (thermal imagery tracking). In the former, our AOGTracker outperforms state-of-the-art tracking algorithms including two trackers based on deep convolutional network   <xref ref-type="bibr" rid="ref5">[5]</xref> , <xref ref-type="bibr" rid="ref6">[6]</xref> . In the latter, our AOGTracker outperforms all other trackers in VOT2013 and is comparable to the state-of-the-art methods in VOT2014, 2015 and TIR2015.

[1]  Ming-Hsuan Yang,et al.  Least Soft-thresold Squares Tracking , 2013 .

[2]  Ales Leonardis,et al.  Robust Visual Tracking Using an Adaptive Coupled-Layer Visual Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jiri Matas,et al.  Long-Term Tracking through Failure Cases , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[5]  S. Carey The Origin of Concepts , 2000 .

[6]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[7]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[8]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[9]  Hanzi Wang,et al.  Incremental Learning of 3D-DCT Compact Representations for Robust Visual Tracking , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Gérard G. Medioni,et al.  Context tracker: Exploring supporters and distracters in unconstrained environments , 2011, CVPR 2011.

[11]  Song-Chun Zhu,et al.  Intrackability: Characterizing Video Statistics and Pursuing Video Representations , 2012, International Journal of Computer Vision.

[12]  François Fleuret,et al.  Exact Acceleration of Linear Object Detectors , 2012, ECCV.

[13]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[15]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Seunghoon Hong,et al.  Visual Tracking by Sampling Tree-Structured Graphical Models , 2014, ECCV.

[17]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Horst Bischof,et al.  Real-Time Tracking via On-line Boosting , 2006, BMVC.

[19]  Seunghoon Hong,et al.  Orderless Tracking through Model-Averaged Posterior Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[21]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[22]  Narendra Ahuja,et al.  Robust visual tracking via multi-task sparse learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Shai Avidan,et al.  Support vector tracking , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  V G Andrew,et al.  AN EFFICIENT IMPLEMENTATION OF A SCALING MINIMUM-COST FLOW ALGORITHM , 1997 .

[25]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[28]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[29]  Robert T. Collins,et al.  Mean-shift blob tracking through scale space , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[30]  Bin Shen,et al.  Online robust image alignment via iterative convex optimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Yanning Zhang,et al.  Part-Based Visual Tracking with Online Latent Structural Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Lei Zhang,et al.  Fast Compressive Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Junseok Kwon,et al.  Tracking by Sampling Trackers , 2011, 2011 International Conference on Computer Vision.

[35]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[36]  Nuno Vasconcelos,et al.  Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[38]  Xiaoqin Zhang,et al.  Graph Embedding Based Semi-supervised Discriminative Tracker , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[39]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[42]  Yali Amit,et al.  POP: Patchwork of Parts Models for Object Recognition , 2007, International Journal of Computer Vision.

[43]  Laura Sevilla-Lara,et al.  Distribution fields for tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Rynson W. H. Lau,et al.  Visual Tracking via Locality Sensitive Histograms , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[47]  Huchuan Lu,et al.  Robust object tracking via sparsity-based collaborative model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Andrew V. Goldberg,et al.  An efficient implementation of a scaling minimum-cost flow algorithm , 1993, IPCO.

[49]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Song-Chun Zhu,et al.  Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[53]  Shai Avidan,et al.  Locally Orderless Tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Pedro F. Felzenszwalb Object detection grammars , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[55]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[56]  Huchuan Lu,et al.  Visual Tracking via Probability Continuous Outlier Model , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[58]  Junseok Kwon,et al.  Highly Nonrigid Object Tracking via Patch-Based Dynamic Appearance Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Lu Zhang,et al.  Structure Preserving Object Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Kaiqi Huang,et al.  An Adaptive Combination of Multiple Features for Robust Tracking in Real Scene , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[62]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Seunghoon Hong,et al.  Online Graph-Based Tracking , 2014, ECCV.

[64]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[65]  Huchuan Lu,et al.  Least Soft-Threshold Squares Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Zhongfei Zhang,et al.  A survey of appearance models in visual object tracking , 2013, ACM Trans. Intell. Syst. Technol..

[68]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[70]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[71]  Yunde Jia,et al.  Discriminatively Trained And-Or Tree Models for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Song-Chun Zhu,et al.  Image Parsing with Stochastic Scene Grammar , 2011, NIPS.

[73]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[74]  Ales Leonardis,et al.  An Enhanced Adaptive Coupled-Layer LGTracker++ , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[75]  Benjamin Z. Yao,et al.  Learning and parsing video events with goal and intent prediction , 2013, Comput. Vis. Image Underst..

[76]  Mohamed ElHelw,et al.  Robust Real-Time Tracking with Diverse Ensembles and Random Projections , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[77]  Alfredo Petrosino,et al.  MATRIOSKA: A Multi-level Approach to Fast Tracking by Learning , 2013, ICIAP.

[78]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  LiXi,et al.  A survey of appearance models in visual object tracking , 2013 .

[81]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Qingshan Liu,et al.  Robust Visual Tracking via Convolutional Networks , 2015 .

[84]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Luc Van Gool,et al.  Beyond semi-supervised tracking: Tracking should be as simple as detection, but not simpler than recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[86]  Michael Felsberg,et al.  Enhanced Distribution Field Tracking Using Channel Representations , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[87]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[88]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[89]  Haibin Ling,et al.  Real time robust L1 tracker using accelerated proximal gradient approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Yang Lu,et al.  Online Object Tracking, Learning, and Parsing with And-Or Graphs , 2014, CVPR.

[91]  Junzhou Huang,et al.  Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.

[92]  Song-Chun Zhu,et al.  A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs , 2011, International Journal of Computer Vision.

[93]  Jiri Matas,et al.  Robustifying the Flock of Trackers , 2011 .