Channel selection in the modulation domain for improved speech intelligibility in noise.

Background noise reduces the depth of the low-frequency envelope modulations known to be important for speech intelligibility. The relative strength of the target and masker envelope modulations can be quantified using a modulation signal-to-noise ratio, (S/N)(mod), measure. Such a measure can be used in noise-suppression algorithms to extract target-relevant modulations from the corrupted (target + masker) envelopes for potential improvement in speech intelligibility. In the present study, envelopes are decomposed in the modulation spectral domain into a number of channels spanning the range of 0-30 Hz. Target-dominant modulations are identified and retained in each channel based on the (S/N)(mod) selection criterion, while modulations which potentially interfere with perception of the target (i.e., those dominated by the masker) are discarded. The impact of modulation-selective processing on the speech-reception threshold for sentences in noise is assessed with normal-hearing listeners. Results indicate that the intelligibility of noise-masked speech can be improved by as much as 13 dB when preserving target-dominant modulations, present up to a modulation frequency of 18 Hz, while discarding masker-dominant modulations from the mixture envelopes.

[1]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[2]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[3]  T. Houtgast,et al.  The Modulation Transfer Function in Room Acoustics as a Predictor of Speech Intelligibility , 1973 .

[4]  Ronald W. Schafer,et al.  Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis , 1973 .

[5]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[6]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[7]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[8]  R. Plomp A signal-to-noise ratio model for the speech-reception threshold of the hearing impaired. , 1986, Journal of speech and hearing research.

[9]  C. Schreiner,et al.  Representation of amplitude modulation in the auditory cortex of the cat. I. The anterior auditory field (AAF) , 1986, Hearing Research.

[10]  D. Grantham,et al.  Modulation masking: effects of modulation frequency, depth, and phase. , 1989, The Journal of the Acoustical Society of America.

[11]  S. Sheft,et al.  Temporal integration in amplitude modulation detection. , 1990, The Journal of the Acoustical Society of America.

[12]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[13]  C Ludvigsen,et al.  Evaluation of a noise reduction method--comparison between observed scores and scores predicted from STI. , 1993, Scandinavian audiology. Supplementum.

[14]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[15]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[16]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  S. Shamma,et al.  Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. , 1996, Journal of neurophysiology.

[18]  Shihab A. Shamma Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method , 1996 .

[19]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[20]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[21]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[22]  Nima Mesgarani,et al.  Speech enhancement based on filtering the spectrotemporal modulations , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[24]  Tammo Houtgast,et al.  A detailed study on the effects of noise on speech intelligibility. , 2007, The Journal of the Acoustical Society of America.

[25]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[26]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[27]  R. Bentler,et al.  Digital noise reduction: Outcomes from laboratory and field studies , 2008, International journal of audiology.

[28]  Y. Hu,et al.  TECHNIQUES FOR ESTIMATING THE IDEAL BINARY MASK , 2008 .

[29]  T. Houtgast,et al.  The concept of signal-to-noise ratio in the modulation domain and speech intelligibility. , 2008, The Journal of the Acoustical Society of America.

[30]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[31]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[32]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[33]  M. Schönwiesner,et al.  Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI , 2009, Proceedings of the National Academy of Sciences.

[34]  Frédéric E. Theunissen,et al.  The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..

[35]  Kuldip K. Paliwal,et al.  Single-channel speech enhancement using spectral subtraction in the short-time modulation domain , 2010, Speech Commun..

[36]  Kuldip K. Paliwal,et al.  Role of modulation magnitude and phase spectrum towards speech intelligibility , 2011, Speech Commun..

[37]  Torsten Dau,et al.  Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. , 2011, The Journal of the Acoustical Society of America.