Tracking Atoms with Particles

We present a general framework and an efficient algorithm for tracking relevant video structures. The structures to be tracked are implicitly defined by a Matching Pursuit procedure that extracts and ranks the most important image contours. Based on the ranking, the contours are automatically selected to initialize a Particle Filtering tracker. The proposed algorithm deals with salient video entities whose behavior has an intuitive meaning, related to the physics of the signal. Moreover, as the interactions between such structures are easily defined, the inference of higher level signal configurations can be made intuitive. The proposed algorithm improves the performance of existing video structures trackers, while reducing the computational complexity. The algorithm is demonstrated on audiovisual source localization.

[1]  Paris Smaragdis,et al.  AUDIO/VISUAL INDEPENDENT COMPONENTS , 2003 .

[2]  Pierre Vandergheynst,et al.  Audiovisual Gestalts , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[3]  Pierre Vandergheynst,et al.  Very low bit rate image coding using redundant dictionaries , 2003, SPIE Optics + Photonics.

[4]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[5]  Rama Chellappa,et al.  Tracking a dynamic set of feature points , 1994, IEEE Trans. Image Process..

[6]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[7]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Katja Nummiaro A Color-based Particle Filter , 2002 .

[9]  Ingrid Daubechies,et al.  Time-frequency localization operators: A geometric phase space approach , 1988, IEEE Trans. Inf. Theory.

[10]  Malcolm Slaney,et al.  FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Sabri Gurbuz,et al.  Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..

[13]  Òscar Divorra Escoda,et al.  Toward sparse and geometry adapted video approximations , 2005 .

[14]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[15]  R. Chellappa,et al.  Appearance Tracking Using Adaptive Models in a Particle Filter , 2004 .

[16]  Timothy F. Cootes,et al.  Interpreting face images using active appearance models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[17]  Pierre Vandergheynst,et al.  Analysis of multimodal sequences using geometric video representations , 2006, Signal Process..

[18]  Trevor Darrell,et al.  Speaker association with signal-level audiovisual fusion , 2004, IEEE Transactions on Multimedia.

[19]  Pierre Vandergheynst,et al.  A Bayesian approach to video expansions on parametric over-complete 2-D dictionaries , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[20]  Harriet J. Nock,et al.  Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.

[21]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[22]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[23]  Michael Elad,et al.  Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Pierre Vandergheynst,et al.  On the use of a priori information for sparse signal approximations , 2006, IEEE Transactions on Signal Processing.

[25]  Pierre Vandergheynst,et al.  Analysis of multimodal signals using redundant representations , 2005, IEEE International Conference on Image Processing 2005.

[26]  Javier R. Movellan,et al.  Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.

[27]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..