A Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification

Abstract One of the most widely used approaches for feature extraction in speaker recognition is the filter bank-based Mel Frequency Cepstral Coefficients (MFCC) approach. The main goal of feature extraction in this context is to extract features from raw speech that captures the unique characteristics of a particular individual. During the feature extraction process, the discrete Fourier transform (DFT) is typically employed to compute the spectrum of the speech waveform. However, over the past few years, the discrete wavelet transform (DWT) has gained remarkable attention, and has been favored over the DFT in a wide variety of applications. The wavelet packet transform (WPT) is an extension of the DWT that adds more flexibility to the decomposition process. This work is a study of the impact on performance, with respect to accuracy and efficiency, when the WPT is used as a substitute for the DFT in the MFCC method. The novelty of our approach lies in its concentration on the wavelet and the decomposition level as the parameters influencing the performance. We compare the performance of the DFT with the WPT, as well as with our previous work using the DWT. It is shown that the WPT results in significantly lower order for the Gaussian Mixture Model (GMM) used to model speech, and marginal improvement in accuracy with respect to the DFT. WPT mirrors DWT in terms of the order of GMM and can perform as well as the DWT under certain conditions.

[1]  Ching-Tang Hsieh,et al.  Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model , 2003, J. Inf. Sci. Eng..

[2]  Claude Turner,et al.  The Wavelet and Fourier Transforms in Feature Extraction for Text-Dependent, Filterbank-Based Speaker Recognition , 2011, Complex Adaptive Systems.

[3]  Marie Farge,et al.  Improved predictability of two-dimensional turbulent flows using wavelet packet compression , 1992 .

[4]  Francis Nolan The phonetic bases of speaker recognition : Cambridge Studies in Speech Science and Communication, Cambridge University Press, Cambridge, 1983, 221 pp. ISBN 0-521-24486-2 , 1987, Speech Commun..

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[9]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[10]  Sadaoki Furui,et al.  Speaker recognition , 1997, Scholarpedia.

[11]  E. Micheli-Tzanakou,et al.  Speaker identification using neural networks and wavelets , 2000, IEEE Engineering in Medicine and Biology Magazine.

[12]  Bart Selman,et al.  ExOpaque: A Framework to Explain Opaque Machine Learning Models Using Inductive Logic Programming , 2007 .

[13]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[14]  Nikos Fakotakis,et al.  Wavelet Packet Bases for Speaker Recognition , 2007 .

[15]  Alex Waibel,et al.  Robust speaker recognition , 2007 .

[16]  Muzhir Shaban Al-Ani,et al.  Speaker Identification: A Hybrid Approach Using Neural Networks and Wavelet Transform , 2007 .

[17]  Kaare Brandt Petersen,et al.  Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.

[18]  J. Wolf Efficient Acoustic Parameters for Speaker Recognition , 1972 .

[19]  Abbes Amira,et al.  Speaker identification using multimodal neural networks and wavelet analysis , 2015, IET Biom..

[20]  M. Wickerhauser Best-Adapted Wavelet Packet Bases , 1990 .

[21]  Raghunath S. Holambe,et al.  Speaker Identification Using Admissible Wavelet Packet Based Decomposition , 2010 .

[22]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.