Noise suppression based on neurophysiologically-motivated SNR estimation for robust speech recognition

A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Herbert Reininger,et al.  Exploiting the potential of auditory preprocessing for robust speech recognition by locally recurrent neural networks , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Birger Kollmeier,et al.  Estimation of the signal-to-noise ratio with amplitude modulation spectrograms , 2002, Speech Commun..

[4]  B. Kollmeier,et al.  Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.

[5]  Torsten Dau,et al.  Frequency selectivity in amplitude‐modulation processing , 1999 .

[6]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[7]  T. Dau Modeling auditory processing of amplitude modulation , 1997 .

[8]  Birger Kollmeier,et al.  Combining speech enhancement and auditory feature extraction for robust speech recognition , 2000, Speech Commun..

[9]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[10]  J Tchorz,et al.  A model of auditory perception as front end for automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.

[11]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[12]  P. Heil,et al.  Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography , 1997, Journal of Comparative Physiology A.

[13]  H. Wust,et al.  A speech recognizer with low complexity based on RNN , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.