Environment mismatch compensation using average eigenspace for speech recognition

The performance of speech recognition systems is adversely affected by mismatch in training and testing environmental conditions. In addition to test data from noisy environments, there are scenarios where the training data itself is noisy. Speech enhancement techniques which solely focus on finding a clean speech estimate from the noisy signal are not effective here. Model adaptation techniques may also be ineffective due to the dynamic nature of the environment. In this paper, we propose a method for mismatch compensation between training and testing environments using the ”average eigenspace” approach when the mismatch is non-stationary. There is no need for explicit adaptation data as the method works on incoming test data to find the compensatory transform. This method is different from traditional signal-noise subspace filtering techniques where the dimensionality of the clean signal space is assumed to be less than the noise space and noise affects all dimensions to the same extent. We evaluate this approach on two corpora which are collected from real car environments: CU-Move and UTDrive. Using Sphinx, a relative reduction of 40-50% is achieved in WER compared to the baseline system. The method also results in a reduction in the dimensionality of the feature vectors allowing for a more compact set of acoustic models in the phoneme space.

[1]  John H. L. Hansen,et al.  Advances for In-Vehicle and Mobile Systems: Challenges for International Standards , 2007 .

[2]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Patrick Wambacq,et al.  Assessment of signal subspace based speech enhancement for noise robust speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jean-Claude Junqua,et al.  Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments , 1999, EUROSPEECH.

[6]  John H. L. Hansen,et al.  CSA-BF: a constrained switched adaptive beamformer for speech enhancement and recognition in real car environments , 2003, IEEE Trans. Speech Audio Process..

[7]  John H. L. Hansen,et al.  UTDrive: The Smart Vehicle Project , 2009 .

[8]  Richard M. Stern,et al.  Robust speech recognition in the automobile , 1994, ICSLP.

[9]  John H. L. Hansen,et al.  CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation , 2005 .

[10]  J. Boudy,et al.  Non-linear spectral subtraction (NSS) and hidden Markov models for robust speech recognition in car noise environments , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Antoine Souloumiac,et al.  Jacobi Angles for Simultaneous Diagonalization , 1996, SIAM J. Matrix Anal. Appl..

[12]  John H. L. Hansen,et al.  Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems , 2003, IEEE Transactions on Audio, Speech, and Language Processing.