Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, which includes speaker diarization together with the annotation of sentence boundaries and the elimination of speaker disfluencies. The sub-area of speaker diarization attempts to identify the number of participants in a meeting and create a list of speech time intervals for each such participant. In this paper, we analyze the correlation between signals coming from multiple microphones and propose an improved method for carrying out speaker diarization for meetings with multiple distant microphones. The proposed algorithm makes use of acoustic information and information from the delays between signals coming from the different sources. Using this procedure, we were able to achieve state-of-the-art performance in the NIST spring 2006 rich transcription evaluation, improving the Diarization Error Rate (DER) by 15% to 20% relative to previous systems.

[1]  Andreas Stolcke,et al.  From switchboard to meetings: development of the 2004 ICSI-SRI-UW meeting recognition system , 2004, INTERSPEECH.

[2]  Jean-Luc Gauvain,et al.  Improving Speaker Diarization , 2004 .

[3]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Xavier Anguera Miró,et al.  Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System , 2006, MLMI.

[5]  Iain McCowan,et al.  Clustering and segmenting speakers and their locations in meetings , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Xavier Anguera Miró ROBUST SPEAKER DIARIZATION FOR MEETINGS , 2006 .

[8]  D A Reynolds,et al.  The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations , 2004 .

[9]  Daniel P. W. Ellis,et al.  Using acoustic condition clustering to improve acoustic change detection on broadcast news , 2000, INTERSPEECH.

[10]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[11]  Mark J. F. Gales,et al.  The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.

[12]  Xavier Anguera Miró,et al.  Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization , 2006, MLMI.

[13]  Xavier Anguera Miró,et al.  Multi-stream speaker diarization systems for the meetings domain , 2006, INTERSPEECH.

[14]  Daniel P. W. Ellis,et al.  Speaker turn segmentation based on between-channel differences , 2004 .

[15]  Ingvar Claesson,et al.  Direction of Arrival Estimation for Multiple Speakers Using Time-Frequency Orthogonal Signal Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Guillaume Lathoud,et al.  Further Applications of Sector-Based Detection and Short-Term Clustering , 2006 .

[17]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[18]  Nikki Mirghafori,et al.  Nuts and Flakes: a Study of Data Characteristics in Speaker Diarization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[20]  Douglas A. Reynolds,et al.  Speaker diarisation for broadcast news , 2004, Odyssey.

[21]  Iain McCowan,et al.  Location based speaker segmentation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Barbara Peskin,et al.  TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[23]  Jean-François Bonastre,et al.  The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Xavier Anguera Miró,et al.  Speaker Diarization for Multi-microphone Meetings Using Only Between-Channel Differences , 2006, MLMI.

[25]  X. Anguera,et al.  Speaker diarization for multi-party meetings using acoustic fusion , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[26]  Xavier Anguera Miró,et al.  Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences , 2006, INTERSPEECH.

[27]  Jean-Luc Gauvain,et al.  Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.

[28]  Andreas Stolcke,et al.  The ICSI Meeting Project: Resources and Research , 2004 .

[29]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[30]  Climent Nadeu,et al.  Hybrid Speech/non-speech detector applied to Speaker Diarization of Meetings , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[31]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[32]  Xavier Anguera Miró,et al.  Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.

[33]  Jean-Marc Odobez,et al.  Unsupervised Location-Based Segmentation of Multi-Party Speech , 2004 .

[34]  Xavier Anguera Miró,et al.  Friends and enemies: a novel initialization for speaker diarization , 2006, INTERSPEECH.

[35]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Andreas Stolcke,et al.  Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System , 2005, MLMI.