论文信息 - Online Speech Activity Detection in Broadcast News

Online Speech Activity Detection in Broadcast News

In this paper, we investigate the important implications of real-time processing to the design of a speech activity detection (SAD) system, with a focus on the impact of the unique constraints posed by online automatic speech recognition. Our investigation is built on a real-life application of speech technology, the BBN Broadcast Monitoring System (BMS), which encapsulates a real-time automatic rich transcription system. We propose a segmentation method that is capable of variable scale speech boundary detection in an online SAD system and evaluate how different granularities of boundary detection impact the performance of speech-to-text (STT) and speaker diarization. In addition, the interactions between STT and speaker diarization are evaluated and mechanisms for trading off the performance of these two system components are studied. In our experiment, the segmentation mechanism in the proposed SAD system reduces error rates of STT and speaker diarization by 2.4% and 9.5% relatively, compared to the baseline system.

[1] Yoshihiko Nankaku,et al. Voice activity detection based on conditional random fields using multiple features , 2010, INTERSPEECH.

[2] Andrey Temko,et al. Enhanced SVM Training for Robust Speech Activity Detection , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3] Douglas A. Reynolds,et al. An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Mark J. F. Gales,et al. The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.

[5] Barbara Peskin,et al. TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[6] Hervé Bourlard,et al. Speech/music segmentation using entropy and dynamism features in a HMM classification framework , 2003, Speech Commun..

[7] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[8] Amit Srivastava,et al. Integrated technologies for indexing spoken language , 2000, CACM.

[9] Daben Liu,et al. Fast speaker change detection for broadcast news transcription and indexing , 1999, EUROSPEECH.

[10] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.