Performance Evaluation of Neural Networks for Speaker Recognition

Speaker Recognition is one of the principle problems in Speech processing. The performance of speaker recognition systems can be improved by carefully choosing and calculating suitable features, which is an arduous task. Therefore, the learning based approach has been found to be simpler, more general and with the rapid growth in Artificial Intelligence, more accurate. This paper is a comparative study of the performance of different neural networks in speaker recognition. The focus of this work is to find which of these learning algorithms is more accurate, less complex, and more generic when it comes to speaker recognition. A database of 5000 utterances, 100 for each of the 50 different speakers, in both clean and noisy environment, with varying levels of noise was used. The MFCC (Mel Frequency Cepstral Coefficients) of these utterances were used as features to train and evaluate the neural networks. Accuracy of all neural networks was expectedly very high (>90%) for clean data, large variations coming in with introduction and change in the level of noise. RBFNN has been shown to consistently perform well under all conditions. DNN was the other consistent performer and has the potential to outperform other techniques, if trained on more data.

[1]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Douglas A. Reynolds,et al.  Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.

[4]  O. Lartillot,et al.  A MATLAB TOOLBOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO , 2007 .

[5]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[6]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[7]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[8]  Weifeng Liu,et al.  The Kernel Least-Mean-Square Algorithm , 2008, IEEE Transactions on Signal Processing.

[9]  D. O'Shaughnessy,et al.  Linear predictive coding , 1988, IEEE Potentials.

[10]  V. Tiwari MFCC and its applications in speaker recognition , 2010 .

[11]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[12]  Pawel Strumillo,et al.  Radial Basis Function Neural Networks: Theory and Applications , 2003 .

[13]  Héctor M. Pérez Meana,et al.  Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and Vector quantization (VQ) techniques , 2012, CONIELECOMP 2012, 22nd International Conference on Electrical Communications and Computers.

[14]  Homayoon Beigi,et al.  Fundamentals of Speaker Recognition , 2011 .

[15]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..