论文信息 - Noise and speaker compensation in the Log filter bank domain

Noise and speaker compensation in the Log filter bank domain

In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.

Vikas Joshi | Srinivasan Umesh | Raghavendra Bilgi | Luz García | M. Carmen Benítez

[1] Srinivasan Umesh,et al. A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics , 2008, INTERSPEECH.

[2] José C. Segura,et al. Combining speaker and noise feature normalization techniques for Automatic Speech Recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Hermann Ney,et al. Implementing frequency-warping and VTLN through linear transformation of conventional MFCC , 2005, INTERSPEECH.

[4] Mark J. F. Gales,et al. Rapid joint speaker and noise compensation for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Abeer Alwan,et al. Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC , 2009, Comput. Speech Lang..

[6] Vikas Joshi,et al. Sub-Band Level Histogram Equalization for Robust Speech Recognition , 2011, INTERSPEECH.

[7] Li Lee,et al. A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[8] Richard M. Stern,et al. A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9] Vikas Joshi,et al. Efficient Speaker and Noise Normalization for Robust Speech Recognition , 2011, INTERSPEECH.