Integrated Online Speaker Clustering and Adaptation

For many applications, it is necessary to produce speech transcriptions in a causal fashion. To produce high quality transcripts, speaker adaptation is often used. This requires online speaker clustering and incremental adaptation techniques to be developed. This paper presents an integrated approach to online speaker clustering and adaptation which allows efficient clustering of speakers using the same accumulated statistics that are normally used for adaptation. Using a consistent criterion for both clustering and adaptation should yield gains for both stages. The proposed approach is evaluated on a meetings transcription task using audio from multiple distant microphones. Consistent gains over standard clustering and adaptation were obtained.

[1]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[2]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[3]  Jean-Luc Gauvain,et al.  Partitioning and transcription of broadcast news data , 1998, ICSLP.

[4]  H. Bourlard,et al.  Improved Unknown-Multiple Speaker clustering using HMM , 2002 .

[5]  Andreas Stolcke,et al.  Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[6]  Mark J. F. Gales,et al.  Using VTLN for broadcast news transcription , 2004, INTERSPEECH.

[7]  Jean-Luc Gauvain,et al.  Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.

[8]  Amit Srivastava,et al.  Online speaker adaptation and tracking for real-time speech recognition , 2005, INTERSPEECH.

[9]  Xavier Anguera Miró ROBUST SPEAKER DIARIZATION FOR MEETINGS , 2006 .

[10]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  José Manuel Pardo,et al.  Robust Speaker Diarization for meetings , 2006 .

[12]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Satoshi Nakamura,et al.  Never-ending learning system for on-line speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[14]  Thomas Hain,et al.  Recognition and understanding of meetings the AMI and AMIDA projects , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[15]  Koichi Shinoda,et al.  Online speaker clustering using incremental learning of an ergodic hidden Markov model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Mark J. F. Gales,et al.  Prior information for rapid speaker adaptation , 2010, INTERSPEECH.

[17]  Srinivasan Umesh,et al.  Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework , 2010, INTERSPEECH.

[18]  Gerhard Rigoll,et al.  GMM-UBM based open-set online speaker diarization , 2010, INTERSPEECH.