Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers

This paper proposes a new method for speaker feature extraction based on Formants, Wavelet Entropy and Neural Networks denoted as FWENN.In the first stage, five formants and seven Shannon entropy wavelet packets are extracted from the speakers' signals as the speaker feature vector.In the second stage, these 12 feature extraction coefficients are used as inputs to feed-forward neural networks.In contrast to conventional speaker identification methods that extract features from sentences (or words), the proposed method extracts the features from vowels.Advantages of using vowels include the ability to identify speakers when only partially-recorded words are available. This may be useful for deaf-mute persons. This paper proposes a new method for speaker feature extraction based on Formants, Wavelet Entropy and Neural Networks denoted as FWENN. In the first stage, five formants and seven Shannon entropy wavelet packet are extracted from the speakers' signals as the speaker feature vector. In the second stage, these 12 feature extraction coefficients are used as inputs to feed-forward neural networks. Probabilistic neural network is also proposed for comparison. In contrast to conventional speaker recognition methods that extract features from sentences (or words), the proposed method extracts the features from vowels. Advantages of using vowels include the ability to recognize speakers when only partially-recorded words are available. This may be useful for deaf-mute persons or when the recordings are damaged. Experimental results show that the proposed method succeeds in the speaker verification and identification tasks with high classification rate. This is accomplished with minimum amount of information, using only 12 coefficient features (i.e. vector length) and only one vowel signal, which is the major contribution of this work. The results are further compared to well-known classical algorithms for speaker recognition and are found to be superior.

[1]  Khaled Daqrouq,et al.  An investigation of speech enhancement using wavelet filtering method , 2010, Int. J. Speech Technol..

[2]  Farzad Towhidkhah,et al.  Audio-visual speaker identification using dynamic facial movements and utterance phonetic content , 2011, Appl. Soft Comput..

[3]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[4]  Dimitris K. Tasoulis,et al.  Generalized locally recurrent probabilistic neural networks with application to text-independent speaker verification , 2007, Neurocomputing.

[5]  Farshad Almasganj,et al.  Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients' speech signal with unilateral vocal fold paralysis , 2007, Comput. Biol. Medicine.

[6]  Yousef Ajami Alotaibi,et al.  Formant Based Analysis of Spoken Arabic Vowels , 2009, COST 2101/2102 Conference.

[7]  Amitava Das,et al.  Hybrid fuzzy logic committee neural networks for recognition of swallow acceleration signals , 2001, Comput. Methods Programs Biomed..

[8]  Rolf Isermann,et al.  Identification of Dynamic Systems: An Introduction with Applications , 2010 .

[9]  S. Jothilakshmi,et al.  Automatic system to detect the type of voice pathology , 2014, Appl. Soft Comput..

[10]  Kandarpa Kumar Sarma,et al.  An ANN based approach to recognize initial phonemes of spoken words of Assamese language , 2013, Appl. Soft Comput..

[11]  Narender P. Reddy,et al.  Speaker verification using committee neural networks , 2003, Comput. Methods Programs Biomed..

[12]  Sourjya Sarkar,et al.  Stochastic feature compensation methods for speaker verification in noisy environments , 2014, Appl. Soft Comput..

[13]  Sheng Chen,et al.  A radial basis function network classifier to maximise leave-one-out mutual information , 2014, Appl. Soft Comput..

[14]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[15]  Carlos Dias Maciel,et al.  Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders , 2007, Comput. Biol. Medicine.

[16]  Engin Avci,et al.  A new optimum feature extraction and classification method for speaker recognition: GWPNN , 2007, Expert Syst. Appl..

[17]  J. Bachorowski,et al.  Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. , 1999, The Journal of the Acoustical Society of America.

[18]  Ruhi Sarikaya,et al.  Arabic diacritic restoration approach based on maximum entropy models , 2009, Comput. Speech Lang..

[19]  Rolf Isermann,et al.  Identification of Dynamical Systems , 2009 .

[20]  Daniel J. Mashao,et al.  Combining classifier decisions for robust speaker identification , 2006, Pattern Recognit..

[21]  M. P. Gelfer,et al.  The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels. , 2005, Journal of voice : official journal of the Voice Foundation.

[22]  Jagannath H. Nirmal,et al.  Voice conversion using General Regression Neural Network , 2014, Appl. Soft Comput..

[23]  Rabul Hussain Laskar,et al.  Comparing ANN and GMM in a voice conversion framework , 2012, Appl. Soft Comput..

[24]  Satyanand Singh,et al.  Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC , 2011 .

[25]  Goutam Saha,et al.  Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter , 2009 .

[26]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[27]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[28]  R. V. Pawar,et al.  Speaker Identification using Neural Networks , 2007, IEC.

[29]  Lamia Bouafif,et al.  Pitch detection and formant analysis of Arabic speech processing , 2001 .

[30]  Saeed Bagheri Shouraki,et al.  Recognition of human speech phonemes using a novel fuzzy approach , 2007, Appl. Soft Comput..

[31]  Xiao Li,et al.  A graphical model for formant tracking , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[32]  Te-Won Lee,et al.  A Spatio-Temporal Speech Enhance Speech Recogn , 2002 .

[33]  Engin Avci,et al.  An expert Discrete Wavelet Adaptive Network Based Fuzzy Inference System for digital modulation recognition , 2007, Expert Syst. Appl..

[34]  Sun-Yuan Kung,et al.  Estimation of elliptical basis function parameters by the EM algorithm with application to speaker verification , 2000, IEEE Trans. Neural Networks Learn. Syst..

[35]  Khaled Daqrouq,et al.  Wavelet Formants Speaker Identification Based System via Neural Network , 2009 .

[36]  Derya Avci,et al.  An expert system for speaker identification using adaptive wavelet sure entropy , 2009, Expert Syst. Appl..

[37]  Jian-Da Wu,et al.  Speaker identification using discrete wavelet packet transform technique with irregular decomposition , 2009, Expert Syst. Appl..

[38]  Li Deng,et al.  A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Li Deng,et al.  Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint , 2003, INTERSPEECH.

[40]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[41]  Francis Nolan,et al.  A case for formant analysis in forensic speaker identification , 2005 .

[42]  A. Ghalwash,et al.  CNN: A speaker recognition system using a cascaded neural network , 1996, Int. J. Neural Syst..

[43]  Seiichi Uchida,et al.  Using eigen-deformations in handwritten character recognition , 2002, Object recognition supported by user interaction for service robots.

[44]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[45]  Philip Rose Forensic Speaker Identification , 2002 .

[46]  Younès Bennani,et al.  Neural networks for discrimination and modelization of speakers , 1995, Speech Commun..

[47]  Yousef Ajami Alotaibi,et al.  Speech Recognition System and Formant Based Analysis of Spoken Arabic Vowels , 2009, FGIT.

[48]  Khaled Daqrouq,et al.  Wavelet entropy and neural network for text-independent speaker identification , 2011, Eng. Appl. Artif. Intell..

[49]  Alex Acero,et al.  Formant analysis and synthesis using hidden Markov models , 1999, EUROSPEECH.