Speech separation by cost-sensitive deep learning

Deep learning based speech separation has demonstrated good performance in adverse environments. Recent study shows that multi-condition training, which trains a model with several noise scenarios, shows good generalization in test. However, treating all noise scenarios with the same training cost is usually not a good choice: A common problem is that, when training data contain a wide range of SNR, the data in low SNR environments suffer from large training loss, which results in a performance drop when test SNRs are low. In this paper, we propose three cost-sensitive deep learning methods to improve the performance of speech separation methods at low SNRs, which are the methods of (i) learning with a cost-sensitive objective, (ii) learning with cost-sensitive oversampling of training data, and (iii) learning with cost-sensitive undersampling of training data. We also propose to aggregate the three methods to a cost- sensitive deep ensemble learning method. Experimental results demonstrate the effectiveness of the proposed methods.

[1]  Paris Smaragdis,et al.  Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Björn W. Schuller,et al.  Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[5]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  DeLiang Wang,et al.  Deep Neural Network Based Supervised Speech Segregation Generalizes to Novel Noises through Large-scale Training , 2015 .

[7]  DeLiang Wang,et al.  Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[9]  DeLiang Wang,et al.  A Deep Ensemble Learning Method for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  DeLiang Wang,et al.  Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.