Speech frame recognition based on less shift sensitive wavelet filter banks

The wavelet transform possesses multi-resolution property and high localization performance; hence, it can be optimized for speech recognition. In our previous work, we show that redundant wavelet filter bank parameters work better in speech recognition task, because they are much less shift sensitive than those of critically sampled discrete wavelet transform (DWT). In this paper, three types of wavelet representations are introduced, including features based on dual-tree complex wavelet transform (DT-CWT), perceptual dual-tree complex wavelet transform, and four-channel double-density discrete wavelet transform (FCDDDWT). Then, appropriate filter values for DT-CWT and FCDDDWT are proposed. The performances of the proposed wavelet representations are compared in a phoneme recognition task using special form of the time-delay neural networks. Performance evaluations confirm that dual-tree complex wavelet filter banks outperform conventional DWT in speech recognition systems. The proposed perceptual dual-tree complex wavelet filter bank results in up to approximately 9.82 % recognition rate increase, compared to the critically sampled two-channel wavelet filter bank.

[1]  Seyyed Ali Seyyedsalehi,et al.  A new representation for speech frame recognition based on redundant wavelet filter banks , 2012, Speech Commun..

[2]  Chip-Hong Chang,et al.  A Generalized Time–Frequency Subtraction Method for Robust Speech Enhancement Based on Wavelet Filter Banks Modeling of Human Auditory System , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  A. Enis Çetin,et al.  Teager energy based feature parameters for speech recognition in car noise , 1999, IEEE Signal Processing Letters.

[4]  Ivan W. Selesnick,et al.  Symmetric nearly shift-invariant tight frame wavelets , 2005, IEEE Transactions on Signal Processing.

[5]  Zekeriya Tufekci,et al.  Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Limin Du,et al.  Wavelet linear prediction vocoder based on auditory model , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[7]  Ivan W. Selesnick,et al.  The design of Hilbert transform pairs of wavelet bases via the flat delay filter , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Ruhi Sarikaya,et al.  Subband based classification of speech under stress , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Ivan W. Selesnick,et al.  A Higher Density Discrete Wavelet Transform , 2006, IEEE Transactions on Signal Processing.

[10]  Omar Farooq,et al.  Mel filter-like admissible wavelet packet structure for speech recognition , 2001, IEEE Signal Processing Letters.

[11]  Sabri Gurbuz,et al.  Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition , 2006, Speech Commun..

[12]  M Bijankhan,et al.  FARSDAT- THE SPEECH DATABASE OF FARSI SPOKEN LANGUAGE , 1994 .

[13]  A. Enis Çetin,et al.  Subband analysis for robust speech recognition in the presence of car noise , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Istvan Pintér,et al.  Perceptual wavelet-representation of speech signals and its application to speech enhancement , 1996, Comput. Speech Lang..

[15]  Zhang Xueying,et al.  The Speech Recognition Based on the Bark Wavelet and CZCPA Features , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Seyyed Ali Seyyedsalehi,et al.  Comparison between wavelet packet transform, Bark Wavelet & MFCC for robust speech recognition tasks , 2010, 2010 The 2nd International Conference on Industrial Mechatronics and Automation.

[17]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[18]  Richard Baraniuk,et al.  The Dual-tree Complex Wavelet Transform , 2007 .

[19]  Seyyed Ali Seyyedsalehi,et al.  Speech recognition using three channel redundant wavelet filterbank , 2010, 2010 The 2nd International Conference on Industrial Mechatronics and Automation.