Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC

Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but effective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components (speaker modeling and pattern matching) is strongly determined by the quality of the front-end. In other words, classification can be at most as accurate as the features. Over the years, MelFrequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. In this paper it has been shown that the inverted Mel-Frequency Cepstral Coefficients is one of the performance enhancement parameters for speaker recognition, which contains high frequency region complementary information in it. This paper introduces the Gaussian shaped filter (GF) while calculation MFCC and inverted MFCC in place of traditional triangular shaped bins. The main idea is to introduce a higher amount of correlation between subband outputs. The performance of both MFCC and inverted MFCC improve with GF over traditional triangular filter (TF) based implementation, individually as well as in combination. In this study the Vector Quantization (VQ) feature matching technique was used, due to high accuracy and its simplicity. The proposed investigation achieved 98.57% of efficiency with a very short test voice sample 2 seconds.

[1]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[2]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[3]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[4]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[5]  Tomi Kinnunen,et al.  Speaker Discriminative Weighting Method for VQ-Based Speaker Identification , 2001, AVBPA.

[6]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  S. R. Mahadeva Prasanna,et al.  Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  D. A. van Leeuwen,et al.  Speech and Audio Signal Processing , 2011 .

[10]  Simon King,et al.  Speech and Audio Signal Processing , 2011 .

[11]  Jean-Marc Odobez,et al.  Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Julian Fiérrez,et al.  Using quality measures for multilevel speaker recognition , 2006, Comput. Speech Lang..

[13]  Douglas D. O'Shaughnessy,et al.  Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[14]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[15]  Hwang Soo Lee,et al.  Bootstrap and aggregating VQ classifier for speaker recognition , 1999 .

[16]  Daniel J. Mashao,et al.  Combining classifier decisions for robust speaker identification , 2006, Pattern Recognit..

[17]  Günther Palm,et al.  A discriminative training algorithm for VQ-based speaker identification , 1999, IEEE Trans. Speech Audio Process..

[18]  Jean-François Bonastre,et al.  Subband architecture for automatic speaker recognition , 2000, Signal Process..

[19]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[20]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[21]  Marco Furini,et al.  International Journal of Computer and Applications , 2010 .

[22]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[23]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[24]  Alex Pentland,et al.  Capturing Individual and Group Behavior with Wearable Sensors , 2009, AAAI Spring Symposium: Human Behavior Modeling.

[25]  Goutam Saha,et al.  Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks , 2008 .

[26]  Khalid Saeed,et al.  Heuristic Method of Arabic Speech Recognition , 2005 .

[27]  M. Faundez-Zanuy,et al.  State-of-the-art in speaker recognition , 2005, IEEE Aerospace and Electronic Systems Magazine.