An Efferent-Inspired Auditory Model Front-End for Speech Recognition

In this paper, we investigate a closed-loop auditory model and explore its potential as a feature representation for speech recognition. The closed-loop representation consists of an auditory-based, efferent-inspired feedback mechanism that regulates the operating point of a filter bank, thus enabling it to dynamically adapt to changing background noise. With dynamic adaptation, the closed-loop representation demonstrates an ability to compensate for the effects of noise on speech, and generates a consistent feature representation for speech when contaminated by different kinds of noises. Our preliminary experimental results indicate that the efferent-inspired feedback mechanism enables the closed-loop auditory model to consistently improve word recognition accuracies, when compared with an open-loop representation, for mismatched training and test noise conditions in a connected digit recognition task. Index Terms: efferent, auditory model, feature extraction

[1]  J. Guinan Physiology of Olivocochlear Efferents , 1996 .

[2]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[4]  Oded Ghitza,et al.  A non-linear efferent-inspired model of the auditory system; matching human confusions in stationary noise , 2009, Speech Commun..

[5]  Oded Ghitza USING AUDITORY FEEDBACK AND RHYTHMICITY FOR DIPHONE DISCRIMINATION OF DEGRADED SPEECH , 2007 .

[6]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[9]  Oded Ghitza Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[10]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[12]  Ray Meddis,et al.  A computer model of auditory efferent suppression: implications for the recognition of speech in noise. , 2010, The Journal of the Acoustical Society of America.

[13]  Hynek Hermansky,et al.  Perceptual Linear Predictive (PLP) Analysis-Resynthesis Technique , 1991, Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[15]  Oded Ghitza,et al.  Auditory neural feedback as a basis for speech processing , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Julius L. Goldstein,et al.  Modeling rapid waveform compression on the basilar membrane as multiple-bandpass-nonlinearity filtering , 1990, Hearing Research.