Speaker Identification: A Hybrid Approach Using Neural Networks and Wavelet Transform

In speaker identification systems, a database is constructed from the speech samples of known speakers. The approach implemented in this paper is hybrid, where the wavelet transform and neural networks are used together to form a system with improved performance. Features are extracted by applying a discrete wavelet transform (DWT), while a neural network (NN) is used for formulating the system database and for handling the task of decision making. The neural network is trained using inputs, which are the feature vectors. A criteria depends on both false acceptance ratio (FAR) and false rejection ratio (FRR) is used to evaluate the system performance. For experimenting the proposed system, a set of 25 randomly aged male and female speakers was used. Results of admitting the members of this set to a secure system were computed and presented. The evaluation criteria parameters obtained are; FAR=14.5% and FRR=24.5%

[1]  Christopher John Long,et al.  Wavelet based feature extraction for phoneme recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Jean-François Bonastre,et al.  Subband architecture for automatic speaker recognition , 2000, Signal Process..

[3]  Joos Vandewalle,et al.  Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications , 2012 .

[4]  Martin Loomes,et al.  Sub-band based text-dependent speaker verification , 2003, Speech Commun..

[5]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[6]  L. P. Ricotti Multitapering and a wavelet variant of MFCC in speech recognition , 2005 .

[7]  Tomi Kinnunen,et al.  Is speech data clustered? - statistical analysis of cepstral features , 2001, INTERSPEECH.

[8]  Anna C. Gilbert,et al.  Robust speech recognition using wavelet coefficient features , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[9]  Jean-François Bonastre,et al.  Localization and selection of speaker-specific information with statistical modeling , 2000, Speech Commun..

[10]  T. Olmez,et al.  Classification of respiratory sounds by using an artificial neural network , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[12]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[13]  Tamer Ölmez,et al.  Classification of Respiratory Sounds by Using an Artificial Neural Network , 2003, Int. J. Pattern Recognit. Artif. Intell..

[14]  Robert I. Damper,et al.  Improving speaker identification in noise by subband processing and decision fusion , 2003, Pattern Recognit. Lett..

[15]  George Tzanetakis,et al.  Audio Analysis using the Discrete Wavelet Transform , 2001 .

[16]  Richard J. Mammone,et al.  Speaker recognition - general classifier approaches and data fusion methods , 2002, Pattern Recognit..

[17]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[18]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.