Requirements for the evaluation of computational speech segregation systems.

Recent studies on computational speech segregation reported improved speech intelligibility in noise when estimating and applying an ideal binary mask with supervised learning algorithms. However, an important requirement for such systems in technical applications is their robustness to acoustic conditions not considered during training. This study demonstrates that the spectro-temporal noise variations that occur during training and testing determine the achievable segregation performance. In particular, such variations strongly affect the identification of acoustical features in the system associated with perceptual attributes in speech segregation. The results could help establish a framework for a systematic evaluation of future segregation systems.

[1]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[2]  Wouter A. Dreschler,et al.  ICRA Noises: Artificial Noise Signals with Speech-like Spectral and Temporal Properties for Hearing Instrument Assessment: Ruidos ICRA: Señates de ruido artificial con espectro similar al habla y propiedades temporales para pruebas de instrumentos auditivos , 2001 .

[3]  W. Dreschler,et al.  ICRA noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International Collegium for Rehabilitative Audiology. , 2001, Audiology : official organ of the International Society of Audiology.

[4]  Birger Kollmeier,et al.  SNR estimation based on amplitude modulation analysis with applications to noise suppression , 2003, IEEE Trans. Speech Audio Process..

[5]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[6]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[7]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[8]  DeLiang Wang,et al.  Speech perception of noise with binary gains. , 2008, The Journal of the Acoustical Society of America.

[9]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[10]  DeLiang Wang,et al.  An algorithm to improve speech recognition in noise for hearing-impaired listeners. , 2013, The Journal of the Acoustical Society of America.

[11]  Torsten Dau,et al.  Environment-aware ideal binary mask estimation using monaural cues , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[12]  Timo Gerkmann,et al.  Generalization of supervised learning for binary mask estimation , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).