Transcribing broadcast news with the 1997 Abbot System

Previous DARPA CSR evaluations have focused on the transcription of broadcast news from both television and radio programmes. This is a challenging task because the data includes a variety of speaking styles and channel conditions. This paper describes the development of a connectionist-hidden Markov model (HMM) system, and the enhancements designed to improve the performance on broadcast news data. Both multilayer perceptron (MLP) and recurrent neural network acoustic models have been investigated. We asses the effect of using gender-dependent acoustic models, and the impact on the performance of varying both the number of parameters and the amount of training data used for acoustic modelling. The use of a context-dependent phone models is described, and the effect of the number of context classes is investigated. We also describe a method for incorporating syllable boundary information during search. Results are reported on the 1997 DARPA Hub-4 development test set.

[1]  J Ang CONTINUOUS SPEECH RECOGNITION AND SPEECH TRANSLATION , 1994 .

[2]  Hervé Bourlard,et al.  Continuous speech recognition , 1995, IEEE Signal Process. Mag..

[3]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Danny Kershaw,et al.  Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System , 1997 .

[5]  Steve R. Waterhouse,et al.  Transcription of broadcast television and radio news: the 1996 ABBOT system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[7]  Steve Renals,et al.  Efficient evaluation of the LVCSR search space using the NOWAY decoder , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Steve Renals,et al.  DECODER TECHNOLOGY FOR CONNECTIONIST LARGE VOCABULARY SPEECH RECOGNITION , 1995 .

[9]  Steve Renals,et al.  THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .

[10]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[11]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[12]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[13]  Steven Greenberg,et al.  Integrating syllable boundary information into speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Anthony J. Robinson,et al.  Context-Dependent Classes in a Hybrid Recurrent Network-HMM Speech Recognition System , 1995, NIPS.