A novel binary mask estimator based on sparse approximation

While most single-channel noise reduction algorithms fail to improve speech intelligibility, the ideal binary mask (IBM) has demonstrated substantial intelligibility improvements. However, this approach exploits oracle knowledge. The main objective of this paper is to introduce a novel binary mask estimator based on a simple sparse approximation algorithm. Our approach does not require oracle knowledge and instead uses knowledge of speech structure.

[1]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[2]  Hongmei Hu,et al.  Supervised sparse coding strategy in hearing aids , 2011, 2011 IEEE 13th International Conference on Communication Technology.

[3]  Joachim M. Buhmann,et al.  Speech Enhancement Using Generative Dictionary Learning , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[5]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[6]  Guoping Li,et al.  Supervised Sparse Coding Strategy in Cochlear Implants , 2011, INTERSPEECH.

[7]  Wouter A Dreschler,et al.  Perceptual effects of noise reduction by time-frequency masking of noisy speech. , 2012, The Journal of the Acoustical Society of America.

[8]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[9]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[10]  DeLiang Wang,et al.  Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design , 2008 .

[11]  S. Mallat A wavelet tour of signal processing , 1998 .

[12]  Peter Jancovic,et al.  Speech enhancement based on Sparse Code Shrinkage employing multiple speech models , 2012, Speech Commun..

[13]  Michael S. Lewicki,et al.  Efficient Coding of Time-Relative Structure Using Spikes , 2005, Neural Computation.

[14]  Pam W. Dawson,et al.  Perceptually optimized gain function for cochlear implant signal-to-noise ratio based noise reduction. , 2012, The Journal of the Acoustical Society of America.

[15]  DeLiang Wang,et al.  Speech perception of noise with binary gains. , 2008, The Journal of the Acoustical Society of America.

[16]  Philipos C. Loizou,et al.  Impact of SNR and gain-function over- and under-estimation on speech intelligibility , 2012, Speech Commun..

[17]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[18]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[19]  Eero P. Simoncelli,et al.  Hierarchical spike coding of sound , 2012, NIPS.

[20]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[22]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[23]  Kamil K. Wójcicki,et al.  Channel selection in the modulation domain for improved speech intelligibility in noise. , 2012, The Journal of the Acoustical Society of America.