Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning

Visual tracking can be treated as a parameter estimation problem that infers target states based on image observations from video sequences. A richer target representation may incur better chances of successful tracking in cluttered and dynamic environments, and thus enhance the robustness. Richer representations can be constructed by either specifying a detailed model of a single cue or combining a set of rough models of multiple cues. Both approaches increase the dimensionality of the state space, which results in a dramatic increase of computation. To investigate the integration of rough models from multiple cues and to explore computationally efficient algorithms, this paper formulates the problem of multiple cue integration and tracking in a probabilistic framework based on a factorized graphical model. Structured variational analysis of such a graphical model factorizes different modalities and suggests a co-inference process among these modalities. Based on the importance sampling technique, a sequential Monte Carlo algorithm is proposed to provide an efficient simulation and approximation of the co-inferencing of multiple cues. This algorithm runs in real-time at around 30 Hz. Our extensive experiments show that the proposed algorithm performs robustly in a large variety of tracking scenarios. The approach presented in this paper has the potential to solve other problems including sensor fusion problems.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  BlakeAndrew,et al.  C ONDENSATION Conditional Density Propagation forVisual Tracking , 1998 .

[3]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[5]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[6]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[7]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[8]  Hai Tao,et al.  A Sampling Algorithm for Tracking Multiple Objects , 1999, Workshop on Vision Algorithms.

[9]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, ECCV.

[10]  M. A. Tanner,et al.  Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 3rd Edition , 1998 .

[11]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[12]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[14]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[15]  Andrew Blake,et al.  A Probabilistic Exclusion Principle for Tracking Multiple Objects , 2000, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[17]  Gregory D. Hager,et al.  Real-time tracking of image regions with changes in geometry and illumination , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[19]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[22]  Ying Wu,et al.  Color tracking by transductive learning , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  Rong Chen,et al.  A Theoretical Framework for Sequential Importance Sampling with Resampling , 2001, Sequential Monte Carlo Methods in Practice.

[24]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[25]  Rama Chellappa,et al.  Simultaneous tracking and verification via sequential posterior estimation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[26]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Shaogang Gong,et al.  Colour Model Selection and Adaption in Dynamic Scenes , 1998, ECCV.

[28]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[29]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[30]  Ying Wu,et al.  Hand modeling, analysis and recognition , 2001, IEEE Signal Process. Mag..

[31]  Rajeev Sharma,et al.  Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[32]  Hai Tao,et al.  Dynamic layer representation with applications to tracking , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[33]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[35]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[36]  Gregory D. Hager,et al.  Joint probabilistic techniques for tracking multi-part objects , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[37]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[38]  Gregory D. Hager,et al.  Incremental focus of attention for robust visual tracking , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Ying Wu,et al.  Bootstrap Initialization of Nonparametric Texture Models for Tracking , 2000, ECCV.

[40]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .