Towards robust music source separation on loud commercial music

Nowadays, commercial music has extreme loudness and heavily compressed dynamic range compared to the past. Yet, in music source separation, these characteristics have not been thoroughly considered, resulting in the domain mismatch between the laboratory and the real world. In this paper, we confirmed that this domain mismatch negatively affect the performance of the music source separation networks. To this end, we first created the out-of-domain evaluation datasets, musdb-L and XL, by mimicking the music mastering process. Then, we quantitatively verify that the performance of the state-of-the-art algorithms significantly deteriorated in our datasets. Lastly, we proposed LimitAug data augmentation method to reduce the domain mismatch, which utilizes an online limiter during the training data sampling process. We confirmed that it not only alleviates the performance degradation on our out-of-domain datasets, but also results in higher performance on in-domain data.

[1]  Soonyoung Jung,et al.  KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing , 2021, ArXiv.

[2]  Alexandre D'efossez Hybrid Spectrogram and Waveform Source Separation , 2021, ArXiv.

[3]  S. Uhlich,et al.  Music Demixing Challenge 2021 , 2021, Frontiers in Signal Processing.

[4]  Romain Hennequin,et al.  Spleeter: a fast and efficient music source separation tool with pre-trained models , 2020, J. Open Source Softw..

[5]  Minseok Kim,et al.  Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation , 2019, ISMIR.

[6]  Jonathan Le Roux,et al.  Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[7]  Fabian-Robert Stöter,et al.  Open-Unmix - A Reference Implementation for Music Source Separation , 2019, J. Open Source Softw..

[8]  Romain Hennequin,et al.  Singing Voice Separation: A Study on Training Data , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jonathan Le Roux,et al.  SDR – Half-baked or Well Done? , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Fabian-Robert Stöter,et al.  MUSDB18-HQ - an uncompressed version of MUSDB18 , 2019 .

[11]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[12]  Mark D. Plumbley,et al.  Latent Variable Analysis and Signal Separation , 2018, Lecture Notes in Computer Science.

[13]  Fabian-Robert Stöter,et al.  MUSDB18 - a corpus for music separation , 2017 .

[14]  Franck Giron,et al.  Improving music source separation based on deep neural networks through data augmentation and network blending , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Antoine Liutkus,et al.  The 2016 Signal Separation Evaluation Campaign , 2017, LVA/ICA.

[16]  Matthias Mauch,et al.  MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research , 2014, ISMIR.

[17]  J. Kates,et al.  Quality and loudness judgments for music subjected to compression limiting. , 2012, The Journal of the Acoustical Society of America.

[18]  Algorithms to measure audio programme loudness and true-peak audio level , 2011 .

[19]  Earl Vickers The Loudness War: Background, Speculation, and Recommendations , 2010 .

[20]  Lorne Bregitzer Secrets of Recording: Professional Tips, Tools & Techniques , 2008 .

[21]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Bob Katz,et al.  Mastering Audio: The Art and the Science , 2002 .

[23]  Eduard Stikvoort,et al.  Digital Dynamic Range Compressor for Audio , 1986 .