On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech

In this paper a novel feature enhancing method for channel robustness with short utterances is employed. The transform reduces the time-varying component of the channel distortion by applying a band-pass filter along the filter-bank domain on a frame-by-frame basis. This procedure enhances the channel cancelling effect given by techniques based on feature trajectory filtering. The transformation parameters are defined employing relative importance analysis based on a discriminant function. In text-dependent speaker verification with telephone speech the transform leads to a reduction in the EER of 10.8%, and further improvements of 23.5% and 40% when combined with RASTA or CMN, respectively.

[1]  Sun-Yuan Kung,et al.  Stochastic Feature Transformation with Divergence-Based Out-of-Handset Rejection for Robust Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[2]  Rong Zheng,et al.  A Comparative Study of Feature and Score Normalization for Speaker Verification , 2006, ICB.

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[5]  Zekeriya Tufekci Convolutional Bias Removal Based on Normalizing the Filterbank Spectral Magnitude , 2007, IEEE Signal Processing Letters.

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[8]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[9]  T.F. Quatieri,et al.  The effects of telephone transmission degradations on speaker recognition performance , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ho YoungJung Filtering of Filter-Bank Energies for Robust Speech Recognition , 2004 .

[11]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[12]  Néstor Becerra Yoma,et al.  Channel Robust Feature Transformation Based on Filter-Bank Energy Filtering , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Climent Nadeu,et al.  Time and frequency filtering of filter-bank energies for robust HMM speech recognition , 2000, Speech Commun..

[14]  Longbiao Wang,et al.  Robust Distant Speech Recognition by Combining Position-Dependent CMN with Conventional CMN , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.