The use of accent-specific pronunciation dictionaries in acoustic model training

Speech recognition systems are increasingly being built to cover an ever wider range of speaker accents. However, electronically available pronunciation dictionaries (PDs) specific to these accents often do not exist and would be time consuming and expensive to build by hand. This paper explores the use of pronunciation modelling for the synthesis of accent-specific PDs directly from acoustic data, and their use in acoustic model training. It is shown that this is particularly effective when the amount of acoustic data from the new accent region is insufficient to build a new recogniser, and it is necessary to retrain an existing system: a further 15% reduction in word error rate can be achieved over and above the 20% reduction resulting from acoustic model retraining alone. This paper also presents an empirical evaluation of an American English PD which has been synthesised from a British English PD.

[1]  Y. Patel,et al.  An integrated multi-dialect speech recognition system with optional speaker adaptation , 1995, EUROSPEECH.

[2]  R. W. King,et al.  Foreign speaker accent classification using phoneme-dependent accent discrimination models and comparisons with human perception benchmarks , 1997, EUROSPEECH.

[3]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[4]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[5]  Steve Young,et al.  WSJCAM0 corpus and recording description , 1994 .

[6]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[7]  Lori Lamel,et al.  On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  John C. Wells,et al.  Accents of English , 1982 .

[9]  Steve Young,et al.  Tree-based state clustering for large vocabulary speech recognition , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[10]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition , 1997, EUROSPEECH.

[12]  Kate Knill,et al.  Hidden Markov Models in Speech and Language Processing , 1997 .