Recent improvements of the RWTH GALE Mandarin LVCSR system

This paper describes the current improvements of the RWTH Mandarin LVCSR system. We introduce a new reduced toneme set developed at RWTH. We are using different toneme sets and pronunciation lexica. For the purpose of discriminative training we will show a fast way to transform word lattices between systems using different toneme sets and pronunciation lexica. In addition to various acoustic front-ends, the current systems use different kinds of neural network toneme posterior features. While different kinds of systems are developed, a two stage decoding framework for combining these systems is applied. We show detailed recognition results of the development cycle of the systems. Finally, two methods to integrate tonal features are compared.

[1]  D. Giuliani,et al.  Acoustic Model Adaptation with Multiple Supervisions , 2006 .

[2]  Georg Heigold,et al.  The RWTH 2007 TC-STAR evaluation system for european English and Spanish , 2007, INTERSPEECH.

[3]  Hermann Ney,et al.  Robust speech recognition using a voiced-unvoiced feature , 2002, INTERSPEECH.

[4]  Haiping Li,et al.  Recognize tone languages using pitch information on the main vowel of each syllable , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  F. de Vriend,et al.  LC-STAR: XML-coded Phonetic Lexica and Bilingual corpora for Speech-to-Speech Translation , 2004, COLING 2004.

[7]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[8]  Hermann Ney,et al.  Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[9]  Mei-Yuh Hwang,et al.  Improved tone modeling for Mandarin broadcast news speech recognition , 2006, INTERSPEECH.

[10]  Fabio Valente,et al.  Hierarchical and parallel processing of modulation spectrum for ASR applications , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Hermann Ney,et al.  Efficient estimation of speaker-specific projecting feature transforms , 2007, INTERSPEECH.

[12]  Hermann Ney,et al.  Feature combination using linear discriminant analysis and its pitfalls , 2006, INTERSPEECH.

[13]  Nelson Morgan,et al.  Learning long-term temporal features in LVCSR using neural networks , 2004, INTERSPEECH.

[14]  Wen Wang,et al.  Building a highly accurate Mandarin speech recognizer , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[15]  Fabio Valente,et al.  Combination of Acoustic Classifiers Based on Dempster-Shafer Theory of Evidence , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Andreas Stolcke,et al.  An efficient repair procedure for quick transcriptions , 2004, INTERSPEECH.

[17]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Georg Heigold,et al.  Development of the 2007 RWTH Mandarin LVCSR system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[19]  Wu Hua,et al.  An application of SAMPA-c for standard Chinese , 2000, INTERSPEECH.