Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition

A method of modelling accent-specific pronunciation variations is presented. Speech from an unseen accent group is phonetically transcribed such that pronunciation variations may be derived. These context-dependent variations are clustered in decision trees which are used as a model of the pronunciation variation associated with this new accent group. The trees are then used to build a new pronunciation dictionary for use during the recognition process. Experiments are presented, based on Wall Street Journal and WSJCAM0 corpora, for the recognition of American speakers using a British English recogniser. Speaker independent as well as speaker dependent adaptation scenarios are presented, giving up to 20% reduction in word error rate. A linguistic analysis of the pronunciation model is presented and finally the technique is combined with maximum likelihood linear regression, a well proven acoustic adaptation technique, yielding further improvement.

[1]  Arthur J. Bronstein,et al.  The Pronunciation of American English , 1960 .

[2]  John C. Wells,et al.  Accents of English , 1982 .

[3]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[4]  Stephen J. Cox,et al.  Confidence measures for the SWITCHBOARD database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[6]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[7]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Larry Gillick,et al.  A probabilistic approach to confidence estimation and evaluation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Michael Cohen,et al.  A phone-dependent confidence measure for utterance rejection , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[11]  Steve Young,et al.  WSJCAM0 corpus and recording description , 1994 .

[12]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..