Data fusion for visual tracking with particles

The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. This is of particular benefit in visual tracking because of the inherent ambiguity of the visual world that stems from its richness and complexity. One important advantage of the particle filtering framework is that it allows the information from different measurement sources to be fused in a principled manner. Although this fact has been acknowledged before, it has not been fully exploited within a visual tracking context. Here we introduce generic importance sampling mechanisms for data fusion and discuss them for fusing color with either stereo sound, for teleconferencing, or with motion, for surveillance with a still camera. We show how each of the three cues can be modeled by an appropriate data likelihood function, and how the intermittent cues (sound or motion) are best handled by generating proposal distributions from their likelihood functions. Finally, the effective fusion of the cues by particle filtering is demonstrated on real teleconference and surveillance data.

[1]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2]  Charles Kervrann,et al.  A Hierarchical Markov Modeling Approach for the Segmentation and Tracking of Deformable Shapes , 1998, Graph. Model. Image Process..

[3]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Larry S. Davis,et al.  Quasi-Random Sampling for Condensation , 2000, ECCV.

[6]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[7]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  R. Collins,et al.  On-line selection of discriminative tracking features , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[10]  Bernt Schiele,et al.  Towards Robust Multi-cue Integration for Visual Tracking , 2001, ICVS.

[11]  Patrick Pérez,et al.  Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.

[12]  Guy Rochard,et al.  Unsupervised segmentation of low clouds from infrared METEOSAT images based on a contextual spatio-temporal labeling approach , 2002, IEEE Trans. Geosci. Remote. Sens..

[13]  N. Gordon A hybrid bootstrap filter for target tracking in clutter , 1995, IEEE Transactions on Aerospace and Electronic Systems.

[14]  Frederic Fol Leymarie,et al.  Tracking Deformable Objects in the Plane Using an Active Contour Model , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Yong Rui,et al.  Better proposal distributions: object tracking using unscented particle filter , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  James L. Crowley,et al.  Face-Tracking and Coding for Video Compression , 1999, ICVS.

[18]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  James L. Flanagan,et al.  A DSP implementation of source location using microphone arrays. , 1996 .

[20]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[21]  Michael J. Black,et al.  Learning the Statistics of People in Images and Video , 2003, International Journal of Computer Vision.

[22]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[23]  Andrew Blake,et al.  A Probabilistic Exclusion Principle for Tracking Multiple Objects , 2004, International Journal of Computer Vision.

[24]  James L. Flanagan,et al.  DSP implementation of source location using microphone arrays , 1996, Optics & Photonics.

[25]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[26]  Alex Pentland,et al.  Looking at People: Sensing for Ubiquitous and Wearable Computing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Harvey F. Silverman,et al.  A two-stage algorithm for determining talker location from linear microphone array data , 1992 .

[28]  Éric Marchand,et al.  Virtual Visual Servoing: a framework for real‐time augmented reality , 2002, Comput. Graph. Forum.

[29]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[30]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[31]  Björn Stenger,et al.  Shape context and chamfer matching in cluttered scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  Bernt Schiele,et al.  Towards robust multi-cue integration for visual tracking , 2001, Machine Vision and Applications.

[33]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Ying Wu,et al.  A co-inference approach to robust visual tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[35]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[36]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[37]  Michael Isard,et al.  A Smoothing Filter for CONDENSATION , 1998, ECCV.

[38]  Esther Koller-Meier,et al.  Tracking multiple objects using the Condensation algorithm , 2001, Robotics Auton. Syst..

[39]  Patrick Pérez,et al.  Towards Improved Observation Models for Visual Tracking: Selective Adaptation , 2002, ECCV.

[40]  Cristian Sminchisescu,et al.  Hyperdynamics Importance Sampling , 2002, ECCV.

[41]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[42]  Wolfram Burgard,et al.  Using the CONDENSATION algorithm for robust, vision-based mobile robot localization , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[43]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[44]  Natan Peterfreund,et al.  Robust Tracking of Position and Velocity With Kalman Snakes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Andrew Blake,et al.  A framework for spatiotemporal control in the tracking of visual contours , 1993, International Journal of Computer Vision.

[46]  G. Carter Coherence and time delay estimation , 1987, Proceedings of the IEEE.

[47]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[48]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[49]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[50]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[51]  Hwann-Tzong Chen,et al.  Trust-region methods for real-time tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[52]  Shyamal Das Peddada,et al.  Least average residual algorithm (LARA) for tracking the motion of Arctic sea ice , 1996, IEEE Trans. Geosci. Remote. Sens..

[53]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[54]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[55]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[56]  Michael Isard,et al.  BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[57]  Maurizio Omologo,et al.  Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[58]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[59]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[60]  E. Malis,et al.  2 1/2 D Visual Servoing , 1999 .

[61]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[62]  James L. Crowley,et al.  Perceptual user interfaces: things that see , 2000, CACM.