Using GMM with Embedded TDNN to Speaker Identification

This paper proposes a modified Gaussian Mixed Model (GMM) with an embedded Time Delay Neural Network (TDNN). It integrates the merits of GMM which is a generative model and TDNN which is a discriminative model. TDNN digests the timing information of the feature sequences, and through the transformation of the feature vectors it makes the hypothesis of variable independence that maximum likelihood needed more reasonable. GMM and TDNN are trained as a whole by means of maximum likelihood probability. In the process of training, the parameters of GMM and TDNN are updated alternately. Experiments show that the proposed model improves accuracy rate against baseline GMM at all SNR, maximum to 21%.

[1]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[2]  Younès Bennani,et al.  On the use of TDNN-extracted features information in talker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[4]  Patrick Kenny,et al.  Comparison between factor analysis and GMM support vector machines for speaker verification , 2008, Odyssey.

[5]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[6]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[7]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Amita Dev Effect of retroflex sounds on the recognition of Hindi voiced and unvoiced stops , 2008, AI & SOCIETY.

[9]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[10]  Shrikanth S. Narayanan,et al.  Robust speaker identification based on selective use of feature vectors , 2007, Pattern Recognit. Lett..

[11]  Andrea Cavallaro,et al.  Target Detection and Tracking With Heterogeneous Sensors , 2008, IEEE Journal of Selected Topics in Signal Processing.

[12]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[13]  Patrick Kenny,et al.  A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[15]  Man-Wai Mak,et al.  Speaker identification using multilayer perceptrons and radial basis function networks , 1994, Neurocomputing.