Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination

In this paper, we utilize source mel-frequency cepstral coefficients (SMFCCs), skewness, and kurtosis extracted in glottal flow signals to improve speaker recognition performance. Generally, because the high band magnitude response of glottal flow signals is somewhat flat, the SMFCCs are extracted using the response below the predefined cutoff frequency. The extracted SMFCC, skewness, and kurtosis are concatenated with conventional feature parameters. Then, dimensional reduction by the principal component analysis (PCA) and the linear discriminat analysis (LDA) is followed to compare performances with conventional systems under equivalent conditions. The proposed recognition system outperformed the conventional system for large scale speaker recognition experiments. Especially, the performance improvement was more noticeable for small Gaussan mixtures.

[1]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[2]  C. L. Nikias,et al.  Higher-order spectra analysis : a nonlinear signal processing framework , 1993 .

[3]  Paavo Alku,et al.  On separating glottal source and vocal tract information in telephony speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  P. Alku,et al.  A method for generating natural-sounding speech stimuli for cognitive brain research , 1999, Clinical Neurophysiology.

[7]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[8]  B. Putra,et al.  Implementation of secure speaker verification at web login page using Mel Frequency Cepstral Coefficient-Gaussian Mixture Model (MFCC-GMM) , 2011, 2011 2nd International Conference on Instrumentation Control and Automation.

[9]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[10]  Nasir Ahmed,et al.  How I came up with the discrete cosine transform , 1991, Digit. Signal Process..

[11]  Douglas D. O'Shaughnessy,et al.  Multitaper MFCC and PLP features for speaker verification using i-vectors , 2013, Speech Commun..