Auditory Perception Based Admissible Wavelet Packet Trees For Speech Recognition

This paper presents the use of auditory perception based admissible wavelet packet tree (WPT) for partitioning of speech frequencies into different bands based on the Mel scale or the Bark Scale. The proposed WPTs selected using root mean square error (RMSE) criterion mimic the Mel scale or the bark scale more accurately and hence the human auditory system. Performance of the features obtained from the proposed WPTs is compared with Mel frequency cepstral coefficients (MFCC). The algorithms are evaluated using NIST TI-46 isolated-word database using hidden Markov model (HMM) as a classifier. Experimental results show that the performance of proposed features is better than MFCC and other wavelet features for isolated word recognition (IWR).

[1]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[2]  Zekeriya Tufekci,et al.  Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Anna C. Gilbert,et al.  Robust speech recognition using wavelet coefficient features , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4]  K. P. Soman,et al.  Insight into Wavelets: From Theory to Practice , 2005 .

[5]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[6]  Christopher John Long,et al.  Discriminant wavelet basis construction for speech recognition , 1998, ICSLP.

[7]  Omar Farooq,et al.  Mel filter-like admissible wavelet packet structure for speech recognition , 2001, IEEE Signal Processing Letters.

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[10]  James R. Glass,et al.  An Implementation of Rational Wavelets and Filter Design for Phonetic Classification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Julius O. Smith,et al.  Bark and ERB bilinear transforms , 1999, IEEE Trans. Speech Audio Process..

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.