Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition

In this paper, a new feature-extraction method is proposed to achieve robustness of speech recognition systems. This method combines the benefits of phase autocorrelation (PAC) with bark wavelet transform. PAC uses the angle to measure correlation instead of the traditional autocorrelation measure, whereas the bark wavelet transform is a special type of wavelet transform that is particularly designed for speech signals. The extracted features from this combined method are called phase autocorrelation bark wavelet transform (PACWT) features. The speech recognition performance of the PACWT features is evaluated and compared to the conventional feature extraction method mel frequency cepstrum coefficients (MFCC) using TI-Digits database under different types of noise and noise levels. This database has been divided into male and female data. The result shows that the word recognition rate using the PACWT features for noisy male data (white noise at 0 dB SNR) is 60%, whereas it is 41.35% for the MFCC features under identical conditions.

[1]  Olivier Rioul,et al.  Fast algorithms for discrete and continuous wavelet transforms , 1992, IEEE Trans. Inf. Theory.

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  J. N. Gowdy,et al.  Feature extraction using discrete wavelet transform for speech recognition , 2000, Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105).

[4]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[5]  Kuldip K. Paliwal,et al.  Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  I. Jolliffe Principal Component Analysis , 2002 .

[8]  Kuldip K. Paliwal,et al.  Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition , 2006, Speech Commun..

[9]  H. Traunmüller Analytical expressions for the tonotopic sensory scale , 1990 .

[10]  John H. L. Hansen,et al.  Robust digit recognition in noise: an evaluation using the AURORA corpus , 2001, INTERSPEECH.

[11]  A. Hussain,et al.  Hierarchical K-Means Algorithm Applied On Isolated Malay Digit Speech Recognit ion , 2012 .

[12]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[13]  Navnath S. Nehe,et al.  Isolated Word Recognition Using Normalized Teager Energy Cepstral Features , 2009, 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[14]  Ahmad Akbari,et al.  SNR-dependent compression of enhanced Mel sub-band energies for compensation of noise effects on MFCC features , 2007, Pattern Recognit. Lett..

[15]  M. Sambur,et al.  Adaptive noise canceling for speech signals , 1978 .

[16]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[17]  Xueying Zhang,et al.  The Speech Recognition Based on the Bark Wavelet Front-End Processing , 2005, FSKD.

[18]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[19]  Richard M. Stern,et al.  Efficient Cepstral Normalization for Robust Speech Recognition , 1993, HLT.

[20]  Christopher E. Reid,et al.  Signal Processing in C , 1991 .

[21]  J. Kingsbury The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medicine and Finance , 2004 .

[22]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[23]  Yang Jie,et al.  On the application of variable-step adaptive noise cancelling for improving the robustness of speech recognition , 2009, 2009 ISECS International Colloquium on Computing, Communication, Control, and Management.

[24]  Jing Bai,et al.  The Speech Recognition System Based On Bark Wavelet MFCC , 2006, 2006 8th international Conference on Signal Processing.

[25]  Matthias Nussbaum,et al.  Advanced Digital Signal Processing And Noise Reduction , 2016 .

[26]  Hynek Hermansky,et al.  Phase AutoCorrelation (PAC) features for noise robust speech recognition , 2012, Speech Commun..

[27]  Saeed V. Vaseghi,et al.  Advanced Digital Signal Processing and Noise Reduction , 2006 .

[28]  D. Gabor Acoustical Quanta and the Theory of Hearing , 1947, Nature.

[29]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.