论文信息 - Multimodal multispeaker probabilistic tracking in meetings

Multimodal multispeaker probabilistic tracking in meetings

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results -based on an objective evaluation procedure-that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach.

Jean-Marc Odobez | Daniel Gatica-Perez | Iain McCowan | Guillaume Lathoud

[1] Vladimir Pavlovic,et al. Multimodal speaker detection using error feedback dynamic Bayesian networks , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[2] Vladimir Pavlovic,et al. Multimodal Speaker Detection Using Input/Output Dynamic Bayesian Networks , 2000, ICMI.

[3] Michael Isard,et al. Visual Motion Analysis by Probabilistic Propagation of Conditional Density , 1998 .

[4] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[5] Frank Dellaert,et al. An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets , 2004, ECCV.

[6] E.,et al. GROUPS : INTERACTION AND PERFORMANCE , 2001 .

[7] Anoop Gupta,et al. Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[8] Trevor Darrell,et al. Multiple person and speaker activity tracking with a particle filter , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Patrick Pérez,et al. Color-Based Probabilistic Tracking , 2002, ECCV.

[10] Stan Z. Li,et al. Markov Random Field Modeling in Computer Vision , 1995, Computer Science Workbench.

[11] Patrick Pérez,et al. Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.

[12] Yong Rui,et al. Real-time speaker tracking using particle filter sensor fusion , 2004, Proceedings of the IEEE.

[13] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14] Nebojsa Jojic,et al. Audio-Video Sensor Fusion with Probabilistic Graphical Models , 2002, ECCV.

[15] Michael Isard,et al. BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16] Michael S. Brandstein,et al. Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[17] Jean-Marc Odobez,et al. A Mixed-State I-Particle Filter for Multi-Camera Speaker Tracking , 2003, ICCV 2003.