Estimation of the signal-to-noise ratio with amplitude modulation spectrograms

An algorithm is proposed which automatically estimates the local signal-to-noise ratio (SNR) between speech and noise. The feature extraction stage of the algorithm is motivated by neurophysiological findings on amplitude modulation processing in higher stages of the auditory system in mammals. It analyzes information on both center frequencies and amplitude modulations of the input signal. This information is represented in two-dimensional, so-called amplitude modulation spectrograms (AMS). A neural network is trained on a large number of AMS patterns generated from mixtures of speech and noise. After training, the network supplies estimates of the local SNR when AMS patterns from "unknown" sound sources are presented. Classification experiments show a relatively accurate estimation of the present SNR in independent 32 ms analysis frames. Harmonicity appears to be the most important cue for analysis frames to be classified as "speech-like", but the spectro-temporal representation of sound in AMS patterns also allows for a reliable discrimination between unvoiced speech and noise.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Hervé Glotin,et al.  A new SNR-feature mapping for robust multistream speech recognition , 1999 .

[3]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[4]  J M Kates,et al.  Classification of background noises for hearing-aid applications. , 1995, The Journal of the Acoustical Society of America.

[5]  Hans Werner Strube,et al.  Noise reduction for speech signals by operations on the modulation frequency spectrum , 1999 .

[6]  Rafik A. Goubran,et al.  SNR estimation of speech signals using subbands and fourth-order statistics , 1999, IEEE Signal Processing Letters.

[7]  Birger Kollmeier,et al.  Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition , 2000, NIPS.

[8]  Birger Kollmeier,et al.  Automatic classification of the acoustical situation using amplitude‐modulation spectrograms , 1999 .

[9]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. , 1997, The Journal of the Acoustical Society of America.

[11]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[12]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[13]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[14]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[15]  Masashi Unoki,et al.  A method of signal extraction from noisy signal based on auditory scene analysis , 1997, Speech Commun..

[16]  Rainer Martin,et al.  An efficient algorithm to estimate the instantaneous SNR of speech signals , 1993, EUROSPEECH.

[17]  Matthias Pätzold,et al.  Handbuch zur Datenaufnahme und Transliteration in TP14 von Verbmobil - 3.0 , 1994 .

[18]  T. Houtgast Frequency selectivity in amplitude-modulation detection. , 1989, The Journal of the Acoustical Society of America.

[19]  D. Grantham,et al.  Modulation masking: effects of modulation frequency, depth, and phase. , 1989, The Journal of the Acoustical Society of America.

[20]  B. Kollmeier,et al.  Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.

[21]  C. Schreiner,et al.  Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. , 1988, Journal of neurophysiology.

[22]  William A. Ainsworth,et al.  A neural model for auditory scene analysis , 1999 .

[23]  Birger Kollmeier,et al.  A psychoacoustical model of the auditory periphery as the front end for ASR , 1999 .

[24]  P. Heil,et al.  Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography , 1997, Journal of Comparative Physiology A.