论文信息 - Estimation of the Ideal Binary Mask using Directional Systems

Estimation of the Ideal Binary Mask using Directional Systems

The ideal binary mask is often seen as a goal for time-frequencymasking algorithms trying to increase speech intelligibility, but therequired availability of the unmixed signals makes it difficult to calculatethe ideal binary mask in any real-life applications. In thispaper we derive the theory and the requirements to enable calculationsof the ideal binary mask using a directional system without theavailability of the unmixed signals. The proposed method has a lowcomplexity and is verified using computer simulation in both idealand non-ideal setups showing promising results.Index Terms— Time-Frequency Masking, Directional systems,Ideal Binary Mask, Speech Intelligibility, Sound separation

DeLiang Wang | Thomas Lunner | Michael Syskind Pedersen | Ulrik Kjems | Jesper B. Boldt

[1] DeLiang Wang,et al. Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[2] Gary W. Elko,et al. Superdirectional microphone arrays , 2000 .

[3] Phil D. Green,et al. Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[4] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[5] Guy J. Brown,et al. Computational auditory scene analysis , 1994, Comput. Speech Lang..

[6] Dorothea Kolossa,et al. Nonlinear Postprocessing for Blind Speech Separation , 2004, ICA.

[7] DeLiang Wang,et al. On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[8] Lauren Calandruccio,et al. Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[9] DeLiang Wang,et al. Two-Microphone Separation of Speech Mixtures , 2008, IEEE Transactions on Neural Networks.

[10] DeLiang Wang,et al. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[11] J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[12] P. Loizou,et al. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.