Source and System Features for Text Independent Speaker Recognition Using GMM Speaker Models

The main objective of this paper is to explore the effectiveness of perceptual features combined with pitch for text independent speaker recognition. In this algorithm, these features are captured and Gaussian mixture models are developed representing L feature vectors of speech for every speaker. Speakers are identified based on first finding posteriori probability density function between mixtures of speaker models and test speech vectors. Speakers are classified based on maximum probability density function which corresponds to a speaker model. This algorithm gives the good overall accuracy of 98% for mel frequency perceptual linear predictive cepstrum combined with pitch for identifying speaker among 8 speakers chosen randomly from 8 different dialect regions in “TIMIT” database by considering GMM speaker models of 12 mixtures. It also gives the better average accuracy of 95.75% for the same feature with respect to 8 speakers chosen randomly from the same dialect region for12 mixtures GMM speaker models. Mel frequency linear predictive cepstrum gives the better accuracy of 96.75% and 96.125% for GMM speaker models of 16 mixtures by considering speakers from different dialect regions and from same dialect region respectively. This algorithm is also evaluated for 4, 8 and 32 mixtures GMM speaker models. 12 mixtures GMM speaker models are tested for population of 20 speakers and the accuracy is found to be slightly less as compared to that for the the speaker population of 8 speakers. The noteworthy feature of speaker identification algorithm is to evaluate the testing procedure on identical messages for all the speakers. This work is extended to speaker verification whose performance is measured in terms of % False rejection rate, % False acceptance rate and % Equal error rate. % False acceptance rate and % Equal error rate are found to be less for mel frequency perceptual linear predictive cepstrum with pitch and % false rejection rate is less for mel frequency linear predictive cepstrum. In this work, F-ratio is computed as a theoretical measure on the features of the training speeches to validate the experimental results for perceptual features with pitch. χ 2 distribution tool is used to perform the statistical justification of good experimental results for all the features with respect to both speaker identification and verification.

[1]  Hynek Hermansky,et al.  Perceptually based processing in automatic speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Rong Zheng,et al.  Improvement of Speaker Identification by Combining Prosodic Features with Acoustic Features , 2004, SINOBIOMETRICS.

[3]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Y. Venkataramani,et al.  Use of perceptual features in iterative clustering based twins identification system , 2008, 2008 International Conference on Computing, Communication and Networking.

[6]  Hynek Hermansky,et al.  The challenge of inverse-E: the RASTA-PLP method , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[7]  S. Arivazhagan,et al.  Fingerprint Verification Using Gabor Co-occurrence Features , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[8]  Y. Venkataramani,et al.  Iterative Clustering Approach for Text Independent Speaker Identification using Multiple Features , 2008, 2008 2nd International Conference on Signal Processing and Communication Systems.

[9]  Kishore Prahallad,et al.  Source and system features for speaker recognition using AANN models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Goutam Saha,et al.  Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter , 2009 .

[11]  Sharath Pankanti,et al.  Advances in Biometric Person Authentication, International Workshop on Biometric Recognition Systems, IWBRS2005, Beijing, China, October 22-23, 2005, Proceedings , 2005, IWBRS.

[12]  Y. Venkataramani,et al.  Text Independent Composite Speaker Identification/Verification Using Multiple Features , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[13]  Y. Venkataramani,et al.  Effectiveness of LP Derived Features and DCTC in Twins Identification - Iterative Speaker Clustering Approach , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[14]  Danoush Hosseinzadeh,et al.  Combining Vocal Source and MFCC Features for Enhanced Speaker Recognition Performance Using GMMs , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.