Hierarchical processing of the modulation spectrum for GALE Mandarin LVCSR system

This paper aims at investigating the use of TANDEM features based on hierarchical processing of the modulation spectrum. The study is done in the framework of the GALE project for recognition of Mandarin Broadcast data. We describe the improvements obtained using the hierarchical processing and the addition of features like pitch and short-term critical band energy. Results are consistent with previous findings on a different LVCSR task suggesting that the proposed technique is effective and robust across several conditions. Furthermore we describe integration into RWTH GALE LVCSR system trained on 1600 hours of Mandarin data and present progress across the GALE 2007 and GALE 2008 RWTH systems resulting in approximatively 20% CER reduction on several data set.

[1]  Johanna D. Moore,et al.  Proceedings of Interspeech 2008 , 2008 .

[2]  Wen Wang,et al.  Building a highly accurate Mandarin speech recognizer , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[3]  Andreas Stolcke,et al.  Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..

[5]  Georg Heigold,et al.  Development of the GALE 2008 Mandarin LVCSR system , 2009, INTERSPEECH.

[6]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[7]  D. Ellis,et al.  CONNECTIONIST FEATURE EXTRACTION FOR CONVENTIONAL HMM SYSTEMS , 1999 .

[8]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[10]  Mei-Yuh Hwang,et al.  Improved tone modeling for Mandarin broadcast news speech recognition , 2006, INTERSPEECH.

[11]  Georg Heigold,et al.  Recent improvements of the RWTH GALE Mandarin LVCSR system , 2008, INTERSPEECH.

[12]  Fabio Valente,et al.  Hierarchical and parallel processing of modulation spectrum for ASR applications , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .