Speech activity and speaker novelty detection methods for meeting processing

Segmentation of multi-speaker meeting audio data recorded with several microphones into speech/silence frames is one of the first tasks at development of the speaker diarization system. Energy normalization techniques and signal correlation methods are used in order to avoid the crosstalk problem, in which participant's speech appears on other participants' microphones. A comparison of different types of microphones and a configuration of the recording devices implemented inside the intelligent meeting room are described. Special attention is paid to improvement of the novelty detection performance of the on-line speaker diarization system.

[1]  Satoshi Nakamura,et al.  Never-ending learning with dynamic hidden Markov network , 2007, INTERSPEECH.

[2]  Andrey Ronzhin,et al.  Designing Cognition-Centric Smart Room Predicting Inhabitant Activities , 2009, HCI.

[3]  Daben Liu,et al.  Fast speaker change detection for broadcast news transcription and indexing , 1999, EUROSPEECH.

[4]  Satoshi Nakamura,et al.  Never-ending learning system for on-line speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Andreas Stolcke,et al.  Multispeaker speech activity detection for the ICSI meeting recorder , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[6]  Maurizio Omologo,et al.  Adaptive weighting of microphone arrays for distant-talking F0 and voiced/unvoiced estimation , 2007, INTERSPEECH.

[7]  Jean-Marc Odobez,et al.  Multimodal multispeaker probabilistic tracking in meetings , 2005, ICMI '05.

[8]  Paul A. Viola,et al.  Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos , 2008, IEEE Transactions on Multimedia.

[9]  T Schaaf,et al.  Technology and Corpora for Speech to Speech Translation Title: Asr Progress Report , 2005 .

[10]  Tanja Schultz,et al.  Crosscorrelation-based multispeaker speech activity detection , 2004, INTERSPEECH.

[11]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Jithendra Vepa,et al.  The segmentation of multi-channel meeting recordings for automatic speech recognition , 2006, INTERSPEECH.

[13]  Nicholas W. D. Evans,et al.  The influence of speech activity detection and overlap on speaker diarization for meeting room recordings , 2007, INTERSPEECH.

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..