On-line multi-modal speaker diarization

This paper presents a novel framework that utilizes multi-modal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress from a simple observation model to a complex multi-modal one as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming from webcameras, human computer interaction and video conferences.

[1]  Ben J. A. Kröse,et al.  EM detection of common origin of multi-modal cues , 2006, ICMI '06.

[2]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[3]  Harriet J. Nock,et al.  Multimodal processing by finding common cause , 2004, CACM.

[4]  Nebojsa Jojic,et al.  A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[7]  Trevor Darrell,et al.  Probabalistic Models and Informative Subspaces for Audiovisual Correspondence , 2002, ECCV.

[8]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[9]  Paul A. Viola,et al.  Learning Informative Statistics: A Nonparametnic Approach , 1999, NIPS.

[10]  Xavier Anguera Miró,et al.  Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization , 2006, MLMI.

[11]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[12]  Petr Motlícek,et al.  Non-parametric speaker turn segmentation of meeting data , 2005, INTERSPEECH.

[13]  Dieter Fox,et al.  Real-time particle filters , 2004, Proceedings of the IEEE.

[14]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[15]  Nebojsa Jojic,et al.  Escaping local minima through hierarchical model selection: Automatic object discovery, segmentation, and tracking in video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).