Towards Semantic Analysis of Conversations: A System for the Live Identification of Speakers in Meetings

In the following article we present an application that enables online identification of who is currently speaking using a single farfield microphone in a meeting scenario. By leveraging techniques from both the field of speaker identification and speaker diarization, the system is able to recognize the current speaker after any two seconds of speech. An evaluation of the robustness of the algorithm using the AMI meeting corpus and the NIST speaker diarization development set resulted in a diarization error rate of 12.67%.

[1]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[3]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[4]  Lie Lu,et al.  UBM-based real-time speaker segmentation for broadcasting news , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[6]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Reinhold Häb-Umbach,et al.  Online speaker change detection by combining BIC with microphone array beamforming , 2006, INTERSPEECH.

[8]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.