论文信息 - Speech separation by cost-sensitive deep learning

Speech separation by cost-sensitive deep learning

Deep learning based speech separation has demonstrated good performance in adverse environments. Recent study shows that multi-condition training, which trains a model with several noise scenarios, shows good generalization in test. However, treating all noise scenarios with the same training cost is usually not a good choice: A common problem is that, when training data contain a wide range of SNR, the data in low SNR environments suffer from large training loss, which results in a performance drop when test SNRs are low. In this paper, we propose three cost-sensitive deep learning methods to improve the performance of speech separation methods at low SNRs, which are the methods of (i) learning with a cost-sensitive objective, (ii) learning with cost-sensitive oversampling of training data, and (iii) learning with cost-sensitive undersampling of training data. We also propose to aggregate the three methods to a cost- sensitive deep ensemble learning method. Experimental results demonstrate the effectiveness of the proposed methods.

Xiao-Lei Zhang

[1] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4] Björn W. Schuller,et al. Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[5] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6] DeLiang Wang,et al. Deep Neural Network Based Supervised Speech Segregation Generalizes to Novel Noises through Large-scale Training , 2015 .

[7] DeLiang Wang,et al. Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8] Zhi-Hua Zhou,et al. Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[9] DeLiang Wang,et al. A Deep Ensemble Learning Method for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10] DeLiang Wang,et al. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.