论文信息 - Automatic generation of multiple pronunciations based on neural networks

Automatic generation of multiple pronunciations based on neural networks

We propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (alternative pronunciations) from the canonical pronunciation. This method can generate multiple forms of alternative pronunciations using the pronunciation network. For generating a sophisticated alternative pronunciation dictionary, two techniques are described: (1) alternative pronunciations with likelihoods and (2) alternative pronunciations for word boundary phonemes. Experimental results on spontaneous speech show that the automatically-derived pronunciation dictionaries give consistently higher recognition rates than a conventional dictionary.

[1] Atsushi Nakamura,et al. Japanese speech databases for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2] Lori Lamel,et al. On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3] Ronald A. Cole,et al. Automatically generated word pronunciations from phoneme classifier output , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Akio Ando,et al. A new method for automatic generation of speaker-dependent phonological rules , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5] Mari Ostendorf,et al. HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[6] Yoshinori Sagisaka,et al. Spontaneous dialogue speech recognition using cross-word context constrained word graphs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7] Terrence J. Sejnowski,et al. NETtalk: a parallel network that learns to read aloud , 1988 .

[8] M. A. Randolph. A data-driven method for discovering and predicting allophonic variation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9] Yoshinori Sagisaka,et al. Automatic generation of a pronunciation dictionary based on a pronunciation network , 1997, EUROSPEECH.

[10] Tilo Sloboda. Dictionary learning: performance through consistency , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11] Michael Riley,et al. A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12] M. Finke,et al. Pronunciation modelling for conversational speech recognition: a status report from WS97 , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[13] Lalit R. Bahl,et al. Recognition of continuously read natural corpus , 1978, ICASSP.

[14] Yoshinori Sagisaka,et al. Variable-order N-gram generation by word-class splitting and consecutive word grouping , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15] Mitch Weintraub,et al. Automatic Learning of Word Pronunciation from Data , 1996 .

[16] Andreas Stolcke,et al. Multiple-pronunciation lexical modeling in a speaker independent speech understanding system , 1994, ICSLP.

[17] Jason J. Humphries. Accent modelling and adaptation in automatic speech recognition , 1998 .