论文信息 - On the use of lattices for the automatic generation of pronunciations

On the use of lattices for the automatic generation of pronunciations

In this paper, we explore the use of lattices to generate pronunciations for speech recognition based on the observation of a few (say one or two) speech utterances of a word. Various search strategies are investigated in combination with schemes where single or multiple pronunciations are generated for each speech utterance. In our experiments, a strategy that combines merging time-overlapping links in a context-dependent subphone lattice and generating multiple pronunciations provides the best recognition accuracy. This results in average relative gains of 30% over the generation of single pronunciations using a Viterbi search.

Sabine Deligne | Lidia Mangu

[1] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[2] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[3] Stanley F. Chen,et al. An empirical study of smoothing techniques for language modeling , 1999 .

[4] Benoît Maison,et al. Automatic generation and selection of multiple pronunciations for dynamic vocabularies , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5] Mark J. F. Gales,et al. Automatic transcription of Broadcast News , 2002, Speech Commun..

[6] Michael Picheny,et al. Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7] Bhuvana Ramabhadran,et al. Acoustics-only based automatic phonetic baseform generation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8] Mark J. F. Gales,et al. Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9] Eduardo Lleida,et al. Speech recognition using automatically derived acoustic baseforms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .