Predictive linear transforms for noise robust speech recognition

It is well known that the addition of background noise alters the correlations between the elements of, for example, the MFCC feature vector. However, standard model-based compensation techniques do not modify the feature-space in which the diagonal covariance matrix Gaussian mixture models are estimated. One solution to this problem, which yields good performance, is joint uncertainty decoding (JUD) with full transforms. Unfortunately, this results in a high computational cost during decoding. This paper contrasts two approaches to approximating full JUD while lowering the computational cost. Both use predictive linear transforms to modify the feature-space: adaptation-based linear transforms, where the model parameters are restricted to be the same as the original clean system; and precision matrix modelling approaches, in particular semi-tied covariance matrices. These predictive transforms are estimated using statistics derived from the full JUD transforms rather than noisy data. The schemes are evaluated on AURORA 2 and a noise-corrupted resource management task.

[1]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[2]  David Kryze,et al.  Vector taylor series based joint uncertainty decoding , 2006, INTERSPEECH.

[3]  Mark J. F. Gales,et al.  Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[5]  Mark J. F. Gales Predictive model-based compensation schemes for robust speech recognition , 1998, Speech Commun..

[6]  Mark J. F. Gales,et al.  Basis superposition precision matrix modelling for large vocabulary continuous speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Andrej Ljolje The importance of cepstral parameter correlations in speech recognition , 1994, Comput. Speech Lang..

[9]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[10]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[11]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[12]  Leonardo Neumeyer,et al.  Probabilistic optimum filtering for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[14]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[15]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust automatic speech recognition , 2008, Speech Commun..

[16]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust speech recognition , 2006, INTERSPEECH.

[17]  Alex Acero,et al.  Joint Discriminative Front End and Back End Training for Improved Speech Recognition Accuracy , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.