论文信息 - Effect of MFCC normalization on vector quantization based speaker identification

Effect of MFCC normalization on vector quantization based speaker identification

Mel Frequency Cepstral Coefficients (MFCC) are widely used in speech recognition and speaker identification. MFCC features are usually pre-processed before being used for recognition. One of these pre-processing is creating delta and delta-delta coefficients and append them to MFCC to create feature vector. Another pre-processing is coefficients mean normalization. In this paper, the effect of these two processes on the accuracy of a Vector Quantization (VQ) speaker identification system is compared. Additionally, it is shown that coefficient variance normalization, which is less common, can improve the accuracy.

Sajad Shirali-Shahreza | Mohammad Hassan Shirali-Shahreza | S. Shirali-Shahreza | M. Shirali-Shahreza

[1] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2] Douglas A. Reynolds,et al. Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[3] Douglas A. Reynolds,et al. Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[4] H. Gish,et al. Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[5] Daniel J. Mashao,et al. Combining classifier decisions for robust speaker identification , 2006, Pattern Recognit..

[6] Biing-Hwang Juang,et al. A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] J. Picone,et al. Speaker Verification using Support Vector Machines , 2006, Proceedings of the IEEE SoutheastCon 2006.

[8] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[9] Tomi Kinnunen,et al. Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10] James R. Glass,et al. Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[12] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13] Waleed H. Abdulla,et al. Robust speaker identification based on perceptual log area ratio and Gaussian mixture models , 2004, INTERSPEECH.