A Wavelet Based Approach for Speaker Identification from Degraded Speech

This paper presents a robust speaker identification method from degraded speech signals. This method is based on the Mel-frequency cepstral coefficients (MFCCs) for feature extraction from the degraded speech signals and the wavelet transform of these signals. It is known that the MFCCs based speaker identification method is not robust enough in the presence of noise and telephone degradations. So, the feature extraction from the wavelet transform of the degraded signals adds more speech features from the approximation and detail components of these signals which assist in achieving higher identification rates. Neural Networks are used in the proposed method for feature matching. The Comparison study between the proposed method and the traditional MFCCs based feature extraction method from noisy speech signals and telephone degraded speech signals with additive white Gaussian noise (AWGN) and colored noise shows that the proposed method improves the recognition rates computed at different degradation cases.

[1]  Richard Kronland-Martinet,et al.  Characterization of acoustic signals through continuous linear time-frequency representations , 1996, Proc. IEEE.

[2]  W. Sweldens Wavelets: What Next? , 1997 .

[3]  P.D. Polur,et al.  Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov model , 2005, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4]  Li Deng,et al.  Speech trajectory discrimination using the minimum classification error learning , 1998, IEEE Trans. Speech Audio Process..

[5]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[6]  Michael Unser,et al.  A review of wavelets in biomedical applications , 1996, Proc. IEEE.

[7]  M. Vetterli,et al.  Wavelets, subband coding, and best bases , 1996, Proc. IEEE.

[8]  Ingrid Daubechies,et al.  Where do wavelets come from? A personal point of view , 1996, Proc. IEEE.

[9]  Petros Maragos,et al.  Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Peter Schröder,et al.  Wavelets in computer graphics , 1996, Proc. IEEE.

[11]  M. Wickerhauser,et al.  Wavelets and time-frequency analysis , 1996, Proc. IEEE.

[12]  Stéphane Mallat,et al.  Wavelets for a vision , 1996, Proc. IEEE.

[13]  Bhaskar D. Rao,et al.  Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Zekeriya Tufekci,et al.  Local feature extraction for robust speech recognition in the presence of noise , 2001 .

[15]  John H. L. Hansen,et al.  Robust and efficient techniques for speech recognition in noise , 2001 .

[16]  Gregory W. Wornell Emerging applications of multirate signal processing and wavelets in digital communications , 1996 .

[17]  Tomi Kinnunen,et al.  Spectral Features for Automatic Text-Independent Speaker Recognition , 2003 .

[18]  P.S. Sathidevi,et al.  Auditory-Based Wavelet Packet Filterbank for Speech Recognition Using Neural Network , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[19]  A. Cohen,et al.  Wavelets: the mathematical background , 1996, Proc. IEEE.

[20]  Alexander I. Galushkin,et al.  Neural Networks Theory , 2007 .

[21]  James S. Walker,et al.  A Primer on Wavelets and Their Scientific Applications , 1999 .

[22]  Gérard Dreyfus,et al.  Neural networks - methodology and applications , 2005 .

[23]  A. Bijaoui,et al.  Wavelets and the study of the distant Universe , 1996, Proc. IEEE.