Discriminative training of HMM using MASPER procedure

The main focus of the article is on incorporating discriminative training into MASPER multilingual training procedure by some necessary modifications. Next the performance of discriminative rules like Maximal Mutual Information (MMI) and Minimal Phone Error (MPE), application of I smoothing technique, setting up convergence parameter, benefits of discriminative training for different hidden Markov models (HMM), etc. are tested and evaluated. Moreover an overview of discriminative training strategies and their relations to the classical Maximum Likelihood (ML) estimation is given. All experiments have been accomplished on Slovak part of MobilDat training database that contains wide range of noises and specific GSM distortions. Achieved results show that discriminative training if properly adjusted can improve performance over ML training on average by 5% depending on the model complexity, training strategies and deployment scenarios. Finally, MPE when properly set may outperform MMI, however it is prone to higher sensitivity to the set parameters, used models and application domain.

[1]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  José A. R. Fonollosa,et al.  Double Layer Architectures for Automatic Speech Recognition Using HMM , 2007 .

[3]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[6]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7]  Mark J. F. Gales Semi-tied covariance matrices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[9]  Hui Jiang,et al.  Large margin hidden Markov models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[11]  Albino Nogueiras,et al.  Duration modeling with expanded HMM applied to speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Jessika Eichel Fundamentals Of Speech , 2016 .

[13]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Gerhard Rigoll,et al.  A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition , 2004, INTERSPEECH.

[15]  Narada D. Warakagoda,et al.  A Noise Robust Multilingual Reference Recogniser Based on Speechdat(II) , 2000, INTERSPEECH.