A new representation for speech frame recognition based on redundant wavelet filter banks

Although the conventional wavelet transform possesses multi-resolution properties, it is not optimized for speech recognition systems. It suffers from lower performance compared with Mel Frequency Cepstral Coefficients (MFCCs) in which Mel scale is based on human auditory perception. In this paper, some new speech representations based on redundant wavelet filter-banks (RWFB) are proposed. RWFB parameters are much less shift-sensitive than those of critically sampled discrete wavelet transform (DWT), so they seem to feature better performance in speech recognition tasks because of having better time-frequency localization ability. However, the improvement is at the expense of higher redundancy. In this paper, some types of wavelet representations are introduced, including a combination of critically sampled DWT and some different multi-channel redundant filter-banks down-sampled by 2. In order to find appropriate filter values for multi-channel filter-banks, effects of changing the zero moments of proposed wavelet are discussed. The corresponding method performances are compared in a phoneme recognition task using time delay neural networks. It is revealed that redundant multi-channel wavelet filter-banks work better than conventional DWT in speech recognition systems. The proposed four-channel higher density discrete wavelet filter-bank results in up to approximately 8.95% recognition rate increase, compared with critically sampled two-channel wavelet filter-bank.

[1]  Zhang Xueying,et al.  Speech recognition based on auditory wavelet packet filter , 2004, Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04. 2004..

[2]  Jian-Da Wu,et al.  Speaker identification using discrete wavelet packet transform technique with irregular decomposition , 2009, Expert Syst. Appl..

[3]  Adrião Duarte Dória Neto,et al.  Digit recognition using wavelet and SVM in Brazilian Portuguese , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Yuan-Ting Zhang,et al.  Bionic wavelet transform: a new time-frequency method based on an auditory model , 2001, IEEE Trans. Biomed. Eng..

[5]  Zekeriya Tufekci,et al.  Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Sabri Gurbuz,et al.  Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition , 2006, Speech Commun..

[7]  M Bijankhan,et al.  FARSDAT- THE SPEECH DATABASE OF FARSI SPOKEN LANGUAGE , 1994 .

[8]  Richard Baraniuk,et al.  The Dual-tree Complex Wavelet Transform , 2007 .

[9]  Ivan W. Selesnick,et al.  Symmetric nearly shift-invariant tight frame wavelets , 2005, IEEE Transactions on Signal Processing.

[10]  William J. Phillips,et al.  New low rate wavelet models for the recognition of single spoken digits , 2000, 2000 Canadian Conference on Electrical and Computer Engineering. Conference Proceedings. Navigating to a New Era (Cat. No.00TH8492).

[11]  Bruno Torrésani,et al.  Time-Frequency and Time-Scale Analysis , 1999 .

[12]  Ivan W. Selesnick,et al.  Gröbner bases and wavelet design , 2004, J. Symb. Comput..

[13]  R. F. Favero Compound wavelets: wavelets for speech recognition , 1994, Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis.

[14]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[15]  A. Farras Abdelnour,et al.  Symmetric tight frame with shifted wavelets , 2005, SPIE Optics + Photonics.

[16]  I. Selesnick The Double Density DWT , 2001 .

[17]  Seyyed Ali Seyyedsalehi,et al.  Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks , 2009, Neural Computing and Applications.

[18]  Chip-Hong Chang,et al.  Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[19]  Ivan W. Selesnick,et al.  A Higher Density Discrete Wavelet Transform , 2006, IEEE Transactions on Signal Processing.

[20]  I. Selesnick,et al.  Symmetric wavelet tight frames with two generators , 2004 .

[21]  Omar Farooq,et al.  Mel filter-like admissible wavelet packet structure for speech recognition , 2001, IEEE Signal Processing Letters.

[22]  Arthur Petrosian,et al.  Wavelets in signal and image analysis : from theory to practice , 2001 .