Automatic generation of multiple pronunciations based on neural networks

We propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (alternative pronunciations) from the canonical pronunciation. This method can generate multiple forms of alternative pronunciations using the pronunciation network. For generating a sophisticated alternative pronunciation dictionary, two techniques are described: (1) alternative pronunciations with likelihoods and (2) alternative pronunciations for word boundary phonemes. Experimental results on spontaneous speech show that the automatically-derived pronunciation dictionaries give consistently higher recognition rates than a conventional dictionary.

[1]  Atsushi Nakamura,et al.  Japanese speech databases for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Lori Lamel,et al.  On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Ronald A. Cole,et al.  Automatically generated word pronunciations from phoneme classifier output , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Akio Ando,et al.  A new method for automatic generation of speaker-dependent phonological rules , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[6]  Yoshinori Sagisaka,et al.  Spontaneous dialogue speech recognition using cross-word context constrained word graphs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[8]  M. A. Randolph A data-driven method for discovering and predicting allophonic variation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Yoshinori Sagisaka,et al.  Automatic generation of a pronunciation dictionary based on a pronunciation network , 1997, EUROSPEECH.

[10]  Tilo Sloboda Dictionary learning: performance through consistency , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Michael Riley,et al.  A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  M. Finke,et al.  Pronunciation modelling for conversational speech recognition: a status report from WS97 , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[13]  Lalit R. Bahl,et al.  Recognition of continuously read natural corpus , 1978, ICASSP.

[14]  Yoshinori Sagisaka,et al.  Variable-order N-gram generation by word-class splitting and consecutive word grouping , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  Mitch Weintraub,et al.  Automatic Learning of Word Pronunciation from Data , 1996 .

[16]  Andreas Stolcke,et al.  Multiple-pronunciation lexical modeling in a speaker independent speech understanding system , 1994, ICSLP.

[17]  Jason J. Humphries Accent modelling and adaptation in automatic speech recognition , 1998 .