Speaker verification using Gaussian posteriorgrams on fixed phrase short utterances

This work explores the speaker verification using fixed phrase short utterances. A novel speaker verification system using Gaussian posteriorgrams is proposed in which the posteriorgram vectors are computed from speaker specific Gaussian mixture model (GMM). The enrollment utterances for each of the target speakers are labeled with GMM trained on the corresponding speaker’s data. The test trials are then labeled with the claimed speaker’s GMMmodel. Dynamic time warping (DTW) is used to find a match score between the posteriorgrams of the claimed speaker and that of test trial. The proposed approach is evaluated on the fixed pass phrase subset of the recent RSR2015 database. For contrast purpose, we have also developed stateof-the-art i-vector system including probabilistic linear discriminant analysis (PLDA) classifier. The proposed framework is found to result in highly improved performance when compared with the i-vector based contrast system. We hypothesize that the cause of this large improvement lies in the use of speaker specific variances information in generation of the posteriorgram representations. On evaluating the proposed framework with non-speaker specific variances, it resulted in significant performance degradation which confirmed our hypothesis.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  S. R. Mahadeva Prasanna,et al.  Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[4]  James R. Glass,et al.  Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Stefan-Adrian Toma,et al.  Automatic speaker verification experiments using HMM , 2010, 2010 8th International Conference on Communications.

[6]  Tridibesh Dutta,et al.  Dynamic Time Warping Based Approach to Text-Dependent Speaker Identification Using Spectrograms , 2008, 2008 Congress on Image and Signal Processing.

[7]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8]  Bin Ma,et al.  Text-dependent speaker verification: Classifiers, databases and RSR2015 , 2014, Speech Commun..

[9]  Matthieu Hébert,et al.  Text-Dependent Speaker Recognition , 2008 .

[10]  Haizhou Li,et al.  I-vectors in the context of phonetically-constrained short utterances for speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Bin Ma,et al.  Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Hugo Van hamme,et al.  Accent recognition using i-vector, Gaussian Mean Supervector and Gaussian posterior probability supervector for spontaneous telephone speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[14]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[16]  Aaron E. Rosenberg,et al.  Connected word talker verification using whole word hidden Markov models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[18]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[19]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  J.M. Naik,et al.  Speaker verification: a tutorial , 1990, IEEE Communications Magazine.