Performance analysis of text-dependent speaker recognition system based on template model based classifiers

Speaker recognition technology has found its usage in varied different fields including security systems, banking operations and forensic laboratories etc. due to its easy implementation, flexibility and security. Each of these applications demand high performance i.e. high speaker recognition rate. In this paper, MFCC based text-dependent SR system is presented. The speaker voice is modeled by Mel spaced frequency spectrum. The performance of the system has been evaluated using our own recorded subject database and compared on the basis of two template model based classifiers - Nearest Neighbor (NN) and Vector Quantization (VQ), in terms of parameters such as recognition accuracy rate, system processing time, number of users and size of the training database. Based on the experiments, it was found that VQ performed better than NN for all the parameters. Although both the classifiers use Euclidean distance as a similarity determination function between training and testing data but NN being an instance based algorithm (lazy algorithm) do not build any model for training data and hence lags behind VQ in performance whereas the VQ builds code vectors of different users by clustering as soon as the training data is fed.

[1]  Fayyaz A. Afsar,et al.  Wavelet transform based automatic speaker recognition , 2009, 2009 IEEE 13th International Multitopic Conference.

[2]  Hossein Sameti,et al.  A fast Speaker Identification method using nearest neighbor distance , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[3]  Tudor Barbu A Supervised Text-Independent Speaker Recognition Approach , 2007 .

[4]  Tomi Kinnunen COMPARISON OF CLUSTERING ALGORITHMS IN SPEAKER IDENTIFICATION , 2000 .

[5]  Robert E. Wohlford,et al.  A comparison of four techniques for automatic speaker recognition , 1980, ICASSP.

[6]  M. Madheswaran,et al.  Design and Performance Comparison of 6-T SRAM Cell in 32nm CMOS, FinFET and CNTFET Technologies , 2013 .

[7]  H. B. Kekre,et al.  Performance Comparison of Speaker Recognition using Vector Quantization by LBG and KFCG , 2010 .

[8]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[9]  Daniel Garcia-Romero,et al.  Linear versus mel frequency cepstral coefficients for speaker recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Goutam Saha,et al.  A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition , 2012, IEEE Signal Processing Letters.

[11]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[12]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[13]  S. R. Mahadeva Prasanna,et al.  Speaker recognition under limited data condition by noise addition , 2011, Expert Syst. Appl..

[14]  Richard J. Mammone,et al.  Speaker recognition - general classifier approaches and data fusion methods , 2002, Pattern Recognit..

[15]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[16]  H. S. Jayanna,et al.  Multi-lingual Speaker Identification with the Constraint of Limited Data , 2013 .