论文信息 - Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition

Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition

A method of modelling accent-specific pronunciation variations is presented. Speech from an unseen accent group is phonetically transcribed such that pronunciation variations may be derived. These context-dependent variations are clustered in decision trees which are used as a model of the pronunciation variation associated with this new accent group. The trees are then used to build a new pronunciation dictionary for use during the recognition process. Experiments are presented, based on Wall Street Journal and WSJCAM0 corpora, for the recognition of American speakers using a British English recogniser. Speaker independent as well as speaker dependent adaptation scenarios are presented, giving up to 20% reduction in word error rate. A linguistic analysis of the pronunciation model is presented and finally the technique is combined with maximum likelihood linear regression, a well proven acoustic adaptation technique, yielding further improvement.

Philip C. Woodland | Jason J. Humphries | P. Woodland

[1] Arthur J. Bronstein,et al. The Pronunciation of American English , 1960 .

[2] John C. Wells,et al. Accents of English , 1982 .

[3] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[4] Stephen J. Cox,et al. Confidence measures for the SWITCHBOARD database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[6] Edward J. Delp,et al. An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[7] Philip C. Woodland,et al. Using accent-specific pronunciation modelling for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8] Larry Gillick,et al. A probabilistic approach to confidence estimation and evaluation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Michael Cohen,et al. A phone-dependent confidence measure for utterance rejection , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[11] Steve Young,et al. WSJCAM0 corpus and recording description , 1994 .

[12] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..