Noise Robust Speaker Verification based on the MFCC and pH Features Fusion and Multicondition Training

This paper investigates the fusion of Mel-frequency cepstral coefficients (MFCC) and pH features, combined with the multicondition training (MT) technique based on artificial colored spectra noises, for noise robust speaker verification. The α-integrated Gaussian mixture models ( α-GMM), an extension of the conventional GMM, are used in the speaker verification experiments. Five real acoustic noises are used to corrupt the speech signals in different signal-to-noise ratios (SNR) for tests. The experiments results show that the use of MFCC + pH feature vectors improves the accuracy of speaker verification systems based on single MFCC. It is also shown that the speaker verification system with the MFCC + pH fusion and the α-GMM with the MT technique achieves the best performance for the speaker verification task in noisy environments.

[1]  Patrice Abry,et al.  A Wavelet-Based Joint Estimator of the Parameters of Long-Range Dependence , 1999, IEEE Trans. Inf. Theory.

[2]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[3]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Mohamad Adnan Al-Alaoui,et al.  Novel digital integrator and differentiator , 1993 .

[5]  Leonardo Zao,et al.  Colored Noise Based Multicondition Training Technique for Robust Speaker Identification , 2011, IEEE Signal Processing Letters.

[6]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[7]  B. Martin PARAMETER ESTIMATION , 2012, Statistical Methods for Biomedical Research.

[8]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[9]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[10]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[11]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[12]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[13]  Rosângela Coelho,et al.  Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional Brownian motion model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Ji Li,et al.  alpha-Gaussian mixture modelling for speaker recognition , 2009, Pattern Recognit. Lett..

[15]  J.M. Naik,et al.  Speaker verification: a tutorial , 1990, IEEE Communications Magazine.

[16]  Dalei Wu Parameter Estimation for -GMM Based on Maximum Likelihood Criterion , 2009, Neural Computation.

[17]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[18]  J.P. Campbell,et al.  Forensic speaker recognition , 2009, IEEE Signal Processing Magazine.

[19]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[20]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..