论文信息 - Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation

Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation

Deep neural networks with convolutional layers usually process the entire spectrogram of an audio signal with the same time-frequency resolutions, number of filters, and dimensionality reduction scale. According to the constant-Q transform, good features can be extracted from audio signals if the low frequency bands are processed with high frequency resolution filters and the high frequency bands with high time resolution filters. In the spectrogram of a mixture of singing voices and music signals, there is usually more information about the voice in the low frequency bands than the high frequency bands. These raise the need for processing each part of the spectrogram differently. In this paper, we propose a multi-band multi-resolution fully convolutional neural network (MBR-FCN) for singing voice separation. The MBR-FCN processes the frequency bands that have more information about the target signals with more filters and smaller dimensionality reduction scale than the bands with less information. Furthermore, the MBR-FCN processes the low frequency bands with high frequency resolution filters and the high frequency bands with high time resolution filters. Our experimental results show that the proposed MBR-FCN with very few parameters achieves better singing voice separation performance than other deep neural networks.

Fei Zhao | Mark D. Plumbley | Emad M. Grais | Fei Zhao

[1] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .

[2] DeLiang Wang,et al. Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3] Naoya Takahashi,et al. Multi-Scale multi-band densenets for audio source separation , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[4] Hakan Erdogan,et al. Deep neural networks for single channel source separation , 2013, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Antoine Liutkus,et al. The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[6] Tillman Weyde,et al. Singing Voice Separation with Deep U-Net Convolutional Networks , 2017, ISMIR.

[7] Mark D. Plumbley,et al. Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network , 2015, LVA/ICA.

[8] Pierre Alexandre Tremblay,et al. Improving Single-Network Single-Channel Separation of Musical Audio with Convolutional Layers , 2018, LVA/ICA.

[9] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[10] Mark D. Plumbley,et al. Evaluation of audio source separation models using hypothesis-driven non-parametric statistical methods , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[11] Mark D. Plumbley,et al. Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation , 2018, LVA/ICA.

[12] Jordi Janer,et al. Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. , 2016, The Journal of the Acoustical Society of America.

[13] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[14] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15] Paris Smaragdis,et al. Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks , 2014, ISMIR.

[16] Francesco Visin,et al. A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[17] Waldo Nogueira,et al. Deep learning models to remix music for cochlear implant users. , 2018, The Journal of the Acoustical Society of America.

[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19] D. Howard,et al. Speech and audio signal processing: processing and perception of speech and music [Book Review] , 2000 .

[20] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .