Bespoke Neural Networks for Score-Informed Source Separation

In this paper, we introduce a simple method that can separate arbitrary musical instruments from an audio mixture. Given an unaligned MIDI transcription for a target instrument from an input mixture, we synthesize new mixtures from the midi transcription that sound similar to the mixture to be separated. This lets us create a labeled training set to train a network on the specific bespoke task. When this model applied to the original mixture, we demonstrate that this method can: 1) successfully separate out the desired instrument with access to only unaligned MIDI, 2) separate arbitrary instruments, and 3) get results in a fraction of the time of existing methods. We encourage readers to listen to the demos posted here: this https URL.

[1]  P. Depalle,et al.  Score-Informed Source Separation of Choral Music , 2019 .

[2]  Jonathan Le Roux,et al.  Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[3]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[4]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[5]  Bryan Pardo,et al.  Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jonathan Le Roux,et al.  Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Emilia Gómez Gutiérrez,et al.  Generating data to train convolutional neural networks for classical musicsource separation , 2017 .