Selective Gammatone Envelope Feature for Robust Sound Event Recognition

Conventional features for Automatic Speech Recognition and Sound Event Recognition such as Mel-Frequency Cepstral Coefficients (MFCCs) have been shown to perform poorly in noisy conditions. We introduce an auditory feature based on the gammatone filterbank, the Selective Gammatone Envelope Feature (SGEF), for Robust Sound Event Recognition where channel selection and the filterbank envelope is used to reduce the effect of noise for specific noise environments. In the experiments with Hidden Markov Model (HMM) recognizers, we shall show that our feature outperforms MFCCs significantly in four different noisy environments at various signal-to-noise ratios. key words: gammatone filterbank, HMM, robust recognition, sound event recognition

[1]  Satoshi Nakamura,et al.  Sound scene data collection in real acoustical environments , 1999 .

[2]  DeLiang Wang,et al.  An auditory-based feature for robust speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  John G. Harris,et al.  Increased mfcc filter bandwidth for noise-robust phoneme recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[5]  Haizhou Li,et al.  Selective gammatone filterbank feature for robust sound event recognition , 2010, INTERSPEECH.

[6]  Fausto Pellandini,et al.  Automatic sound detection and recognition for noisy environment , 2000, 2000 10th European Signal Processing Conference.

[7]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Satoshi Nakamura,et al.  CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments , 2008, INTERSPEECH.

[9]  Satoshi Nakamura,et al.  AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition , 2005, IEICE Trans. Inf. Syst..

[10]  Chin-Hui Lee,et al.  Improvements in connected digit recognition using higher order spectral and energy features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Jhing-Fa Wang,et al.  Robust Environmental Sound Recognition for Home Automation , 2008, IEEE Transactions on Automation Science and Engineering.

[14]  Roy D. Patterson,et al.  A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[15]  Jeff A. Bilmes,et al.  What HMMs Can Do , 2006, IEICE Trans. Inf. Syst..

[16]  Hermann Ney,et al.  Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Haizhou Li,et al.  Scream detection for home applications , 2010, 2010 5th IEEE Conference on Industrial Electronics and Applications.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[20]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[22]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[23]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[24]  Mario E. Munich,et al.  Auditory image model features for automatic speech recognition , 2005, INTERSPEECH.

[25]  J Tchorz,et al.  A model of auditory perception as front end for automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.

[26]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .