HCRF-based model compensation for noisy speech recognition

Hidden conditional random fields (HCRFs) belong to a type of discriminative models for pattern classification. It is modified from conditional random fields framework and have been shown its advantages for acoustic modeling in speech recognition. This paper extends HCRF methodology to develop a robust technique for noisy speech recognition. We rearrange the linear chain structure of HCRF to its associated HMM and then take approximation of the Gaussian mixture models of the HMM with Taylor expansion. This makes it possible to obtain the proper relation in statistics between HCRF and HMM and then we propose a operative transformation for adapting the seed HCRFs to a set of noise matched HCRFs. This study addresses the following related issues: (1) how to implement the HCRFs-based compensation for noisy environment; (2) the integration of noise and channel bias compensation in HCRF frameworks; and (3) comparison of performance between HMM-based and HCRF-based noisy mixed-lingual (Mandarin and English) speech recognition. The experimental results indicate that proposed HCRF-based model compensation framework enjoys potential for development in robust speech recognition.

[1]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Alex Acero,et al.  Training Algorithms for Hidden Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[4]  Mark J. F. Gales,et al.  Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Daniel Jurafsky,et al.  Hidden Conditional Random Fields for phone recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[7]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  Wei-Tyng Hong Speaker identification using Hidden Conditional Random Field-based speaker models , 2010, 2010 International Conference on Machine Learning and Cybernetics.