On Factors Affecting MFCC-Based Speaker Recognition Accuracy

We evaluate the accuracy of an MFCC-based speaker recognition method. We analyse the recognition results using speech signal from everyday life environments. We study the mismatch effects of text-dependency, sample length, language, style of speaking, cheating, microphone, sample quality, and noise. The experiments on a self-collected corpus of 30 subjects indicate that any mismatch degrades recognition accuracy. The most dominating factors are noise, microphone, disguise, and degrading of the sample rate and quality. Speech-related factors and sample length are less critical.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4]  Pasi Fränti,et al.  Accuracy of MFCC-Based Speaker Recognition in Series 60 Device , 2005, EURASIP J. Adv. Signal Process..

[5]  Javier Ortega-Garcia,et al.  AHUMADA: a large speech corpus in Spanish for speaker identification and verification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  R. H. Dalaqua University College, London , 1910, Nature.

[7]  Jean-Luc Gauvain,et al.  Speaker verification over the telephone , 2000, Speech Commun..