Wavelet entropy and neural network for text-independent speaker identification

In the present study, the techniques of wavelet transform (WT) and neural network were developed for speech based text-independent speaker identification. The first five formants in conjunction with the Shannon entropy of wavelet packet (WP) upon level four features extraction method was developed. Thirty-five features were fed to feed-forward backpropagation neural networks (FFPBNN) for classification. The functions of features extraction and classification are performed using the wavelet packet and formants neural networks (WPFNN) expert system. The declared results show that the proposed method can make an effectual analysis with average identification rates reaching 91.09. Two published methods were investigated for comparison. The best recognition rate selection obtained was for WPFNN. Discrete wavelet transform (DWT) was studied to improve the system robustness against the noise of -2dB.

[1]  Louis D. Braida,et al.  Human and machine consonant recognition , 2005, Speech Commun..

[2]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[3]  Omar Farooq,et al.  Phoneme recognition using wavelet based features , 2003, Inf. Sci..

[4]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[5]  Jian-Da Wu,et al.  Speaker identification using discrete wavelet packet transform technique with irregular decomposition , 2009, Expert Syst. Appl..

[6]  Engin Avci,et al.  Speech recognition using a wavelet packet adaptive network based fuzzy inference system , 2006, Expert Syst. Appl..

[7]  Harvey F. Silverman,et al.  Time-varying feature selection and classification of unvoiced stop consonants , 1994, IEEE Trans. Speech Audio Process..

[8]  Shung-Yung Lung Efficient text independent speaker recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm , 2007, Pattern Recognit..

[9]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Li Deng,et al.  A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Li Deng,et al.  Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint , 2003, INTERSPEECH.

[12]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[13]  Sung-Yang Bang,et al.  Feature selection for multi-class classification using pairwise class discriminatory measure and covering concept , 2000 .

[14]  Lamia Bouafif,et al.  Pitch detection and formant analysis of Arabic speech processing , 2001 .

[15]  Xiao Li,et al.  A graphical model for formant tracking , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Te-Won Lee,et al.  A Spatio-Temporal Speech Enhance Speech Recogn , 2002 .

[17]  Shung Yung Lung APPLIED MULTI-WAVELET FEATURE TO TEXT INDEPENDENT SPEAKER IDENTIFICATION , 2004 .

[18]  Younès Bennani,et al.  Neural networks for discrimination and modelization of speakers , 1995, Speech Commun..

[19]  Mübeccel Demirekler,et al.  Speaker identification through use of features selected using genetic algorithm , 1998 .

[20]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[21]  Alex Acero,et al.  Formant analysis and synthesis using hidden Markov models , 1999, EUROSPEECH.

[22]  Zekeriya Tufekci,et al.  Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[23]  Daniel J. Mashao,et al.  Combining classifier decisions for robust speaker identification , 2006, Pattern Recognit..

[24]  M. P. Gelfer,et al.  The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels. , 2005, Journal of voice : official journal of the Voice Foundation.

[25]  J.H.L. Hansen,et al.  High resolution speech feature parametrization for monophone-based stressed speech recognition , 2000, IEEE Signal Processing Letters.

[26]  Zdravko Kacic,et al.  Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling , 2003, INTERSPEECH.

[27]  Khaled Daqrouq,et al.  An investigation of speech enhancement using wavelet filtering method , 2010, Int. J. Speech Technol..

[28]  Kun Xia,et al.  A new strategy of formant tracking based on dynamic programming , 2000, INTERSPEECH.

[29]  Patricia A. Nava,et al.  Speaker independent voice recognition with a fuzzy neural network , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[30]  Shubha L. Kadambe,et al.  Applications of adaptive wavelets for speech , 1994 .

[31]  Gianpaolo Evangelista,et al.  Comb and multiplexed wavelet transforms and their applications to signal processing , 1994, IEEE Trans. Signal Process..

[32]  Bart Kosko,et al.  Neural networks and fuzzy systems: a dynamical systems approach to machine intelligence , 1991 .

[33]  Robert F. Port,et al.  Effects of temporal correction on intelligibility of foreign-accented English , 1997 .

[34]  Derya Avci,et al.  An expert system for speaker identification using adaptive wavelet sure entropy , 2009, Expert Syst. Appl..

[35]  Ahmet Arslan,et al.  An intelligent system for diagnosis of the heart valve diseases with wavelet packet neural networks , 2003, Comput. Biol. Medicine.

[36]  Gianpaolo Evangelista,et al.  Pitch-synchronous wavelet representations of speech and music signals , 1993, IEEE Trans. Signal Process..

[37]  Dante Augusto Couto Barone,et al.  A speaker identification system using a model of artificial neural networks for an elevator application , 2001, Inf. Sci..

[38]  Carlos Dias Maciel,et al.  Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders , 2007, Comput. Biol. Medicine.

[39]  Farshad Almasganj,et al.  Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients' speech signal with unilateral vocal fold paralysis , 2007, Comput. Biol. Medicine.

[40]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[41]  Engin Avci,et al.  A new optimum feature extraction and classification method for speaker recognition: GWPNN , 2007, Expert Syst. Appl..

[42]  J. Bachorowski,et al.  Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. , 1999, The Journal of the Acoustical Society of America.