Hybrid HMM-NN for speech recognition and prior class probabilities

During the last years, speech recognition technologies have started their migration from research laboratories to real word applications gaining market shares. Although this shows that paradigms like Neural Networks have reached a high level of accuracy in modeling speech, it must be realized that there is still room for improving recognition performances exploiting the feedbacks coming from the applicative fields. In these cases, in fact, precious application dependent speech material can be recorded, and used to train the acoustic models in order to improve the behaviour of the recognizer on target dictionaries. The best results can be achieved when an iterative, refining process is set up. Unfortunately, speech corpora coming from the field are seldom phonetically balanced and this can cause the performances of the Neural Network to get worse, wasting the benefits of the refining process. In this paper, the problem of Prior Probability normalization has been faced and a method for Prior Probability normalization has been investigated, with the important characteristic of being applicable simply through a modification of the biases at the end of the training phase (therefore on trained nets). An experimentation on several languages is reported, showing the Prior Probability normalization seems quite useful to improve recognition accuracy and to get rid of some undesired effects of training data-bases not perfectly phonetically balanced.

[1]  S Sheffield,et al.  RECENT IMPROVEMENTS TO THE ABBOT LARGE VOCABULARY CSR SYSTEM , 1995 .

[2]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Roberto Gemello,et al.  Continuous speech recognition with neural networks and stationary-transitional acoustic units , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[4]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[7]  Roberto Gemello,et al.  CSELT hybrid HMM/neural networks technology for continuous speech recognition , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[8]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[9]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.