Speech recognition for Japanese spoken language

This paper overviews the speech recognition issues for the Japanese language, and introduces three research projects conducted at NTT (Nippon Telegraph and Telephone) Human Interface Laboratories and ATR Interpreting Telephony Research Laboratories. The first topic is stochastic language models for the sequences of Japanese characters to be used in a Japanese dictation system with unlimited vocabulary. The second topic is an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition based on an HMM-LR algorithm. This algorithm was applied to a telephone directory assistance system that recognizes spontaneous speech having a vocabulary size of roughly 80000. The third topic is a continuous speech recognition system based on strategies of the phoneme-context-dependent (allophonic) modeling and parsing.<<ETX>>

[1]  Kiyohiro Shikano,et al.  Very-large-vocabulary continuous speech recognition algorithm for telephone directory assistance , 1993, EUROSPEECH.

[2]  Kenji Kita,et al.  ATREUS: a speech recognition front-end for a speech translation system , 1993, EUROSPEECH.

[3]  Kenji Kita,et al.  ATR HMM-LR continuous speech recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  K. Shikano,et al.  Task adaptation in stochastic language models for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  T. Kawabata,et al.  Phonetic typewriter based on phoneme source modeling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Shigeki Sagayama,et al.  A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Kiyohiro Shikano,et al.  Recognition of noisy speech by composition of hidden Markov models , 1993, EUROSPEECH.

[8]  K. Shikano,et al.  Japanese dictation system using character source modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Kenji Kita,et al.  HMM continuous speech recognition using predictive LR parsing , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Shigeki Sagayama,et al.  The SSS-LR continuous speech recognition system: integrating SSS-derived allophone models and a phoneme-context-dependent LR parser , 1992, ICSLP.

[11]  Kenji Kita,et al.  Phoneme-context-dependent LR parsing algorithms for HMM-based continuous speech recognition , 1991, EUROSPEECH.

[12]  Masaaki Nagata,et al.  ATR's speech translation system: ASURA , 1993, EUROSPEECH.

[13]  Tetsuo Kosaka,et al.  Rapid speaker adaptation using speaker-mixture allophone models applied to speaker-independent speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Satoshi Takahashi,et al.  Phoneme HMMs constrained by frame correlations , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.