Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks

The sources separated by most single channel audio source separation techniques are usually distorted and each separated source contains residual signals from the other sources. To tackle this problem, we propose to enhance the separated sources to decrease the distortion and interference between the separated sources using deep neural networks (DNNs). Two different DNNs are used in this work. The first DNN is used to separate the sources from the mixed signal. The second DNN is used to enhance the separated signals. To consider the interactions between the separated sources, we propose to use a single DNN to enhance all the separated sources together. To reduce the residual signals of one source from the other separated sources (interference), we train the DNN for enhancement discriminatively to maximize the dissimilarity between the predicted sources. The experimental results show that using discriminative enhancement decreases the distortion and interference between the separated sources.

[1]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[2]  VincentPascal,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010 .

[3]  Jonathan Le Roux,et al.  Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Maurice Charbit,et al.  Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[5]  Hakan Erdogan,et al.  Hidden Markov Models as Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation , 2012, INTERSPEECH.

[6]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[7]  Hakan Erdogan,et al.  Deep neural networks for single channel source separation , 2013, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Paris Smaragdis,et al.  Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[11]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[12]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[13]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[14]  Mark D. Plumbley,et al.  Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network , 2015, LVA/ICA.

[15]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[16]  Mark D. Plumbley,et al.  Evaluation of audio source separation models using hypothesis-driven non-parametric statistical methods , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[17]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  DeLiang Wang,et al.  A two-stage approach for improving the perceptual quality of separated speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Paris Smaragdis,et al.  Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks , 2014, ISMIR.

[20]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[22]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[23]  Björn W. Schuller,et al.  Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[24]  Hakan Erdogan,et al.  Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation , 2013, INTERSPEECH.

[25]  Emmanuel Vincent,et al.  Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.