论文信息 - Using stacked transformations for recognizing foreign accented speech

Using stacked transformations for recognizing foreign accented speech

A common problem in speech recognition for foreign accented speech is that there is not enough training data for an accent-specific or a speaker-specific recognizer. Speaker adaptation can be used to improve the accuracy of a speaker-independent recognizer, but a lot of adaptation data is needed for speakers with a strong foreign accent. In this paper we propose a rather simple and successful technique of stacked transformations where the baseline models trained for native speakers are first adapted by using accent-specific data and then by another transformation using speaker-specific data. Because the accent-specific data can be collected offline, the first transformation can be more detailed and comprehensive, and the second one less detailed and fast. Experimental results are provided for speaker adaptation in English spoken by Finnish speakers. The evaluation results confirm that the stacked transformations are very helpful for fast speaker adaptation.

Mikko Kurimo | Peter Smit

[1] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[2] Mikko Kurimo,et al. Unsupervised cross-lingual speaker adaptation for accented speech recognition , 2010, 2010 IEEE Spoken Language Technology Workshop.

[3] M. Wester. The EMIME Bilingual Database , 2010 .

[4] Tao Chen,et al. Accent Issues in Large Vocabulary Continuous Speech Recognition , 2004, Int. J. Speech Technol..

[5] Steve Young,et al. The HTK book , 1995 .

[6] Keiichi Tokuda,et al. Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project , 2010, SSW.

[7] Tanja Schultz,et al. Comparison of acoustic model adaptation techniques on non-native speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[9] Katarina Bartkova,et al. On using units trained on foreign data for improved multiple accent speech recognition , 2007, Speech Commun..

[10] Harald Höge,et al. Foreign-accented speaker-independent speech recognition , 2004, INTERSPEECH.

[11] Takao Kobayashi,et al. Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training , 2007, IEICE Trans. Inf. Syst..