Joint uncertainty decoding with the second order approximation for noise robust speech recognition

Joint uncertainty decoding has recently achieved promising results by integrating the front-end uncertainty into the back-end in a mathematically consistent framework. In this paper, joint uncertainty decoding is compared with the widely used vector Taylor series (VTS). We show that the two methods are identical except that joint uncertainty decoding applies the Taylor expansion on each regression class whereas VTS applies it to each HMM mixture. The relatively rougher expansion points used in joint uncertainty decoding make it computationally cheaper than VTS but inevitably worse on recognition accuracy. To overcome this drawback, this paper proposes an improved joint uncertainty decoding algorithm which employs second-order Taylor expansion on each regression class in order to reduce the expansion errors. Special considerations are further given to limit the overall computational cost by adopting different number of regression classes for different orders in the Taylor expansion. Experiments on the Aurora 2 database show that the proposed method is able to beat VTS on recognition accuracy and computational cost with relative improvement up to 6% and 60%, respectively.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[3]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[4]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust automatic speech recognition , 2008, Speech Commun..

[5]  Hugo Van hamme,et al.  Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Saeed Vaseghi,et al.  Speech recognition in noisy environments , 1992, ICSLP.

[8]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[9]  Yifan Gong,et al.  High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[10]  David Kryze,et al.  Vector taylor series based joint uncertainty decoding , 2006, INTERSPEECH.