A Dual-Microphone Speech Enhancement Algorithm for Close-Talk System

While human listening is robust in complex auditory scenes, current speech enhancement algorithms do not perform well in noisy environments, even close-talk system is used. This paper addresses the robustness in dual microphone embedded close talk system by employing a computational auditory scene analysis (CASA) framework. The energy difference between the two microphones is used as the primary separation cue to estimate the ideal binary mask (IBM). We also use voice activity detection to find the noise periods, and update the separation critical value. Generalization interference locations and reverberant conditions are used to examine performance of the proposed system. Evaluation and comparison show that the proposed system outperforms other two systems on the test conditions. DOI :  http://dx.doi.org/10.11591/telkomnika.v12i6.5485 Full Text: PDF

[1]  Wang Guang Yan,et al.  A Signal Subspace Speech Enhancement Method for Various Noises , 2013 .

[2]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[4]  Feng Zhenming Performance of binary time-frequency masks in low signal to noise ratio environments , 2012 .

[5]  Jie Wang,et al.  A Parameters Optimization Method of v-Support Vector Machine and Its Application in Speech Recognition , 2013, J. Comput..

[6]  DeLiang Wang,et al.  Exploring Monaural Features for Classification-Based Speech Segregation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Hong Zhou,et al.  Energy Difference Based Speech Segregation for Close-Talk System , 2012 .

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Xiao Chen,et al.  Performance Evaluation of a Gammatone Filterbank for the Embedded System , 2013 .

[10]  Chunying Fang,et al.  Speech Emotion Recognition based on Optimized Support Vector Machine , 2012, J. Softw..

[11]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  DeLiang Wang,et al.  A Supervised Learning Approach to Monaural Segregation of Reverberant Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[14]  Guy J. Brown,et al.  Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment , 2013, Comput. Speech Lang..

[15]  Yi Jiang,et al.  Using energy difference for speech separation of dual-microphone close-talk system , 2013 .

[16]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Philipos C. Loizou,et al.  A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function , 2012, IEEE Transactions on Audio, Speech, and Language Processing.