论文信息 - Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter

Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. On a recent contribution by authors, it has been shown that the Inverted Mel- Frequency Cepstral Coefficients (IMFCC) is useful feature set for SI, which contains complementary information present in high frequency region. This paper introduces the Gaussian shaped filter (GF) while calculating MFCC and IMFCC in place of typical triangular shaped bins. The objective is to introduce a higher amount of correlation between subband outputs. The performances of both MFCC & IMFCC improve with GF over conventional triangular filter (TF) based implementation, individually as well as in combination. With GMM as speaker modeling paradigm, the performances of proposed GF based MFCC and IMFCC in individual and fused mode have been verified in two standard databases YOHO, (Microphone Speech) and POLYCOST (Telephone Speech) each of which has more than 130 speakers. Keywords—Gaussian Filter, Triangular Filter, Subbands, Correlation, MFCC, IMFCC, GMM.

Goutam Saha | Sandipan Chakroborty

[1] Bayya Yegnanarayana,et al. Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[2] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4] Thomas Fang Zheng,et al. Comparison of different implementations of MFCC , 2001, Journal of Computer Science and Technology.

[5] Douglas D. O'Shaughnessy,et al. Speech communication : human and machine , 1987 .

[6] John G. Proakis,et al. Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[7] Dominique Genoud,et al. POLYCOST: A telephone-speech database for speaker recognition , 2000, Speech Commun..

[8] Jean-François Bonastre,et al. Subband architecture for automatic speaker recognition , 2000, Signal Process..

[9] Richard Lippmann,et al. Speech recognition by machines and humans , 1997, Speech Commun..

[10] S. R. Mahadeva Prasanna,et al. Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[11] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12] Zheng Fang,et al. Comparison of different implementations of MFCC , 2001 .

[13] S. R. Mahadeva Prasanna,et al. Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[14] Nikos Fakotakis,et al. Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[15] Johan Lindberg,et al. Guidelines for experiments on the POLYCOST database , 1996 .

[16] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[17] Douglas D. O'Shaughnessy,et al. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[18] Julian Fiérrez,et al. Using quality measures for multilevel speaker recognition , 2006, Comput. Speech Lang..

[19] Goutam Saha,et al. Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks , 2008 .

[20] M. Faundez-Zanuy,et al. State-of-the-art in speaker recognition , 2005, IEEE Aerospace and Electronic Systems Magazine.

[21] Joseph P. Campbell. Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[22] A. Harrag,et al. LDA Combination of Pitch and MFCC Features in Speaker Recognition , 2005, 2005 Annual IEEE India Conference - Indicon.

[23] J. P. Campbell. Speaker recognition : A tutorial : Automated biometrics , 1997 .

[24] I. Miller. Probability, Random Variables, and Stochastic Processes , 1966 .

[25] Daniel J. Mashao,et al. Combining classifier decisions for robust speaker identification , 2006, Pattern Recognit..