Delta-MFCC Features and Information Theoretic Expectation Maximization based Text-independent Speaker Verification System

Gaussian Mixture Model (GMM)-based speaker models yield very good performance in text.independent speaker verification systems. GMMs use expectation maximization (EM) as an optimization procedure to train speaker models. This paper proposes the information theoretic expectation maximization (ITEM) with improved convergence rates to train the speaker models. The approach amounts to information theoretic (IT) since it uses the parzen density estimation and Kullback.Leibler (KL) divergence measure. EM encounters the problem of convergence thus to enhance the convergence rates of EM, an IT procedure is incorporated. The proposed ITEM algorithm adapts means, covariances, and weights, like the conventional EM algorithm; however, this process is not conducted directly on feature vectors but on a smaller set of centroids derived by the IT procedure, which simultaneously minimizes the divergence between the Parzen estimates of the feature vector's distribution within a given class and the centroids distribution within the same class. The ITEM algorithm was applied to the speaker verification problem using NIST 2001, 2002, 2004, and 2006 speaker recognition evaluation corpora and MFCC with delta, energy, and zero-crossing features. The results showed an improvement of the equal error rate over the classical EM approach. The ITEM method also showed higher convergence rates compare with the EM method.

[1]  Deniz Erdogmus,et al.  Vector quantization using information theoretic concepts , 2005, Natural Computing.

[2]  Ethem Alpaydin,et al.  Soft vector quantization and the EM algorithm , 1998, Neural Networks.

[3]  Robert J. Marks,et al.  The use of cone-shaped kernels for generalized time-frequency representations of nonstationary signals , 1990, IEEE Trans. Acoust. Speech Signal Process..

[4]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[5]  Robert J. Marks,et al.  Some properties of the generalized time frequency representation with cone-shaped kernel , 1992, IEEE Trans. Signal Process..

[6]  Allan R. Hunt,et al.  Use of a Frequency-Hopping Radar for Imaging and Motion Detection Through Walls , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[8]  Margaret Lech,et al.  Using information theoretic vector quantization for GMM based speaker verification , 2008, 2008 16th European Signal Processing Conference.

[9]  A. Tanju Erdem,et al.  Automatic emotion recognition for facial expression animation from speech , 2009, 2009 IEEE 17th Signal Processing and Communications Applications Conference.

[10]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[11]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[14]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Margaret Lech,et al.  Speaker Verification Based on Information Theoretic Vector Quantization , 2008, IMTIC.

[17]  S. R. Rogers,et al.  On analytical evaluation of glint error reduction for frequency-hopping radars , 1991 .

[18]  Jan Skoglund,et al.  Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[19]  H. Sorenson,et al.  Recursive bayesian estimation using gaussian sums , 1971 .

[20]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[21]  Lawrence Carin,et al.  Bayesian Compressive Sensing , 2008, IEEE Transactions on Signal Processing.

[22]  Wei Wu,et al.  GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[23]  Margaret Lech,et al.  Information Theoretic Expectation Maximization Based Gaussian Mixture Modeling for Speaker Verification , 2010, 2010 20th International Conference on Pattern Recognition.

[24]  Phillip E. Pace,et al.  A reconfigurable direct RF receiver architecture , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[25]  Gunnar Bark,et al.  Power control in an LPI adaptive frequency-hopping system for HF communications , 1997 .

[26]  Richard G. Baraniuk,et al.  Bayesian Compressive Sensing Via Belief Propagation , 2008, IEEE Transactions on Signal Processing.

[27]  Hui Xiong,et al.  An Approach to Intra-Pulse Modulation Recognition Based on the Ambiguity Function , 2010, Circuits Syst. Signal Process..

[28]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[29]  Norman C. Beaulieu,et al.  Interception of Frequency-Hopped Spread-Spectrum Signals , 1990, IEEE J. Sel. Areas Commun..

[30]  Margaret Lech,et al.  Speaker Verification Based on Different Vector Quantization Techniques with Gaussian Mixture Models , 2009, 2009 Third International Conference on Network and System Security.

[31]  Andreas Polydoros,et al.  LPI Detection of Frequency-Hopping Signals Using Autocorrelation Techniques , 1985, IEEE J. Sel. Areas Commun..

[32]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[33]  J. Haupt,et al.  A Nyquist folding analog-to-information receiver , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[34]  Sridha Sridharan,et al.  Vector quantization based Gaussian modeling for speaker verification , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[35]  Evaggelos Geraniotis,et al.  Analysis of compressive receivers for the optimal interception of frequency-hopped waveforms , 1994, IEEE Trans. Commun..

[36]  P. E. Pace,et al.  Nyquist folding analog-to-information receiver: Autonomous information recovery using quadrature mirror filtering , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[37]  Constantine Kotropoulos,et al.  Gaussian Mixture Modeling by Exploiting the Mahalanobis Distance , 2008, IEEE Transactions on Signal Processing.

[38]  Ying Liu,et al.  The Role of Dynamic Features in Text-Dependent and -Independent Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.