Adapting grapheme-to-phoneme conversion for name recognition

This work investigates the use of acoustic data to improve grapheme-to-phoneme conversion for name recognition. We introduce a joint model of acoustics and graphonemes, and present two approaches, maximum likelihood training and discriminative training, in adapting graphoneme model parameters. Experiments on a large-scale voice-dialing system show that the maximum likelihood approach yields a relative 7% reduction in SER compared to the best baseline result we obtained without leveraging acoustic data, while discriminative training enlarges the SER reduction to 12%.

[1]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[2]  Mitch Weintraub,et al.  Learning name pronunciations in automatic speech recognition systems , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[3]  Min Tang,et al.  Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation , 2004, INTERSPEECH.

[4]  W. T. Illingworth,et al.  Practical guide to neural nets , 1991 .

[5]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[6]  Frédéric Béchet,et al.  Dynamic generation of proper name pronunciations for directory assistance , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Frédéric Bimbot,et al.  Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[8]  Chin-Hui Lee,et al.  Discriminative training of language models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Ryan Thomas,et al.  Grapheme to phoneme conversion and dictionary verification using graphonemes , 2003, INTERSPEECH.

[11]  Hermann Ney,et al.  Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.