Binaural deep neural network classification for reverberant speech segregation

While human listening is robust in complex auditory scenes, current speech segregation algorithms do not perform well in noisy and reverberant environments. This paper addresses the robustness in binaural speech segregation by employing binary classification based on deep neural networks (DNNs). We systematically examine DNN generalization to untrained configurations. Evaluations and comparisons show that DNN based binaural classification produces superior segregation performance in a variety of multisource and reverberant conditions.

[1]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  WangDeLiang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013 .

[3]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[4]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[5]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[6]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[7]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[8]  Steven van de Par,et al.  A Binaural Scene Analyzer for Joint Localization and Recognition of Speakers in the Presence of Interfering Noise Sources and Reverberation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[10]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[11]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  Guy J. Brown,et al.  Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment , 2013, Comput. Speech Lang..

[13]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[14]  Tim Brookes,et al.  Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  DeLiang Wang,et al.  Binaural segregation in multisource reverberant environments. , 2006, The Journal of the Acoustical Society of America.

[16]  Guy J. Brown,et al.  A Classification-based Cocktail-party Processor , 2003, NIPS.

[17]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.