Japanese large-vocabulary continuous speech recognition system based on microsoft whisper

Input of Asian ideographic characters has traditionally been one of the biggest impediments for information processing in Asia. Speech is arguably the most effective and efficient input method for Asian non-spelling characters. This paper presents a Japanese large-vocabulary continuous speech recognition system based on Microsoft Whisper technology. We focus on the aspects of the system that are language specific and demonstrate the adaptability of the Whisper system to new languages. In this paper, we demonstrate that our pronunciation/part-of-speech distinguished morpheme based language models and Whisper based Japanese senonic acoustic models are able to yield state-of-the-art Japanese LVCSR recognition performance. The speaker-independent character and Kana error rates on the JNAS database are 10% and 5% respectively.

[1]  Katsuhiko Shirai,et al.  Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[3]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ken Lunde,et al.  Understanding Japanese Information Processing , 1993 .

[6]  Baosheng Yuan,et al.  Towards large vocabulary Mandarin Chinese speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Mei-Yuh Hwang,et al.  Microsoft Windows highly intelligent speech recognizer: Whisper , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.