A Broadcast News Corpus for Evaluation and Tuning of German LVCSR Systems

Transcription of broadcast news is an interesting and challenging application for large-vocabulary continuous speech recognition (LVCSR). We present in detail the structure of a manually segmented and annotated corpus including over 160 hours of German broadcast news, and propose it as an evaluation framework of LVCSR systems. We show our own experimental results on the corpus, achieved with a state-of-the-art LVCSR decoder, measuring the effect of different feature sets and decoding parameters, and thereby demonstrate that real-time decoding of our test set is feasible on a desktop PC at 9.2% word error rate.

[1]  Christoph Neukirchen,et al.  DUcoder-the Duisburg University LVCSR stackdecoder , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Gerhard Rigoll The ALERT system: advanced broadcast speech recognition technology for selective dissemination of multimedia information , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[3]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[4]  Hermann Ney,et al.  Large vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach , 2002, Speech Commun..

[5]  Martine Adda-Decker,et al.  The 300k LIMSI German broadcast news transcription system , 2003, INTERSPEECH.

[6]  Tanja Schultz,et al.  The ISL RT04 Mandarin Broadcast News Evaluation System , 2004 .

[7]  Alexandre Allauzen,et al.  Where are we in transcribing French broadcast news? , 2005, INTERSPEECH.

[8]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[9]  Mark J. F. Gales,et al.  Progress in the CU-HTK broadcast news transcription system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Jean-Luc Gauvain,et al.  Improved acoustic modeling for transcribing Arabic broadcast data , 2007, INTERSPEECH.

[11]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[12]  Björn W. Schuller,et al.  Non-negative matrix factorization as noise-robust feature extractor for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.