The 1995 ABBOT LVCSR system for multiple unknown microphones

ABBOT is a hybrid (connectionist-hidden Markov model) large-vocabulary speech recognition (LVCSR) system, developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes, which are used as observation probabilities within an HMM. This paper describes the system which participated in the November 1995 ARPA Hub-3 multiple unknown microphones (MUM) evaluation of continuous speech recognition systems, under the guise of the CU-CON system. The emphasis of the paper is on the changes made to the 1994 ABBOT system, specifically to accommodate the H3 task. This includes improved acoustic modelling using limited word-internal context-dependent models, training on the Wall Street Journal secondary channel database, and using the linear input network for speaker and environmental adaptation. Experimental results are reported for various test and development sets from the November 1994 and 1995 ARPA benchmark tests.

[1]  Ciro Martins,et al.  Unsupervised Speaker-Adaptation For Hybrid Hmm-Mlp Continuous Speech Recognition System , 1995 .

[2]  Anthony J. Robinson,et al.  Boosting the performance of connectionist large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[4]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[5]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[6]  Hervé Bourlard,et al.  Continuous speech recognition by connectionist statistical methods , 1993, IEEE Trans. Neural Networks.

[7]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[8]  Steve Renals,et al.  Efficient search using posterior phone probability estimates , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Steve R. Waterhouse,et al.  Smoothed local adaptation of connectionist systems , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Anthony J. Robinson,et al.  Context-Dependent Classes in a Hybrid Recurrent Network-HMM Speech Recognition System , 1995, NIPS.