Mel Frequency Cepstral Coefficients for Speaker Recognition Using Gaussian Mixture Model-Artificial Neural Network Model

Speaker Recognition (SP) is a topic of great significance in areas of intelligent and security. In Biometric SP using automated method of verifying or recognizing the identity of the person on the basis of some application, such as a finger print or face pattern and human voice. Many method have been proposed in the literature are focusing on front end processing such as PLP and LPC. In this paper, we study the applicability of Artificial Neural Network (ANNs) as core classifiers and Gaussian Mixture Mode (GMMs) for Mel Frequency Cepstral Coefficients (MFCC). Two different approaches have been compared. The GMMs commonly used in many application domains firstly review. We also applied a sampled method for speaker recognition that is based on ANNs. The experiment result shows that the Gaussian Mixture Model achieved highest accuracy than ANN model. However, GMM despite certain disadvantages they present mainly at the training stage, the Artificial Neural Network show better performance for speech and need less training data than the GMM-based ones [1]. It is assumed that hybrid of both models will perform better and merit for further development.

[1]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[2]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[3]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[4]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[5]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[7]  Javier Ortega-Garcia,et al.  A comparative study of MLP-based artificial neural networks in text-independent speaker verification against GMM-based systems , 2001, INTERSPEECH.

[8]  Jay M. Naik,et al.  A hybrid HMM-MLP speaker verification algorithm for telephone speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  On Automatic Speaker Recognition , 2007 .

[10]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[11]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[12]  Sadaoki Furui,et al.  Fifty years of progress in speech and speaker recognition , 2004 .

[13]  P. Nurmi Mixture Models , 2008 .

[14]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[15]  Yonghong Yan,et al.  Speech recognition using neural networks with forward-backward probability generated targets , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  J. Oglesby,et al.  Optimisation of neural models for speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.