Towards Musically Meaningful Explanations Using Source Separation

Deep neural networks (DNNs) are successfully applied in a wide variety of music information retrieval (MIR) tasks. Such models are usually considered "black boxes", meaning that their predictions are not interpretable. Prior work on explainable models in MIR has generally used image processing tools to produce explanations for DNN predictions, but these are not necessarily musically meaningful, or can be listened to (which, arguably, is important in music). We propose audioLIME, a method based on Local Interpretable Model-agnostic Explanation (LIME), extended by a musical definition of locality. LIME learns locally linear models on perturbations of an example that we want to explain. Instead of extracting components of the spectrogram using image segmentation as part of the LIME pipeline, we propose using source separation. The perturbations are created by switching on/off sources which makes our explanations listenable. We first validate audioLIME on a classifier that was deliberately trained to confuse the true target with a spurious signal, and show that this can easily be detected using our method. We then show that it passes a sanity check that many available explanation methods fail. Finally, we demonstrate the general applicability of our (model-agnostic) method on a third-party music tagger.

[1]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[5]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[6]  Gerhard Widmer,et al.  Two-level Explanations in Music Emotion Recognition , 2019, ArXiv.

[7]  Xavier Serra,et al.  musicnn: Pre-trained convolutional neural networks for music audio tagging , 2019, ArXiv.

[8]  Jonathan Le Roux,et al.  Universal Sound Separation , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[9]  Bob L. Sturm,et al.  Understanding a Deep Machine Listening Model Through Feature Inversion , 2018, ISMIR.

[10]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[11]  Jonathan Le Roux,et al.  Learning to Separate Sounds from Weakly Labeled Scenes , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Romain Hennequin,et al.  SPLEETER: A FAST AND STATE-OF-THE ART MUSIC SOURCE SEPARATION TOOL WITH PRE-TRAINED MODELS , 2019 .

[13]  Bob L. Sturm,et al.  Analysing Scattering-Based Music Content Analysis Systems: Where's the Music? , 2016, ISMIR.

[14]  Xavier Serra,et al.  Toward Interpretable Music Tagging with Self-Attention , 2019, ArXiv.

[15]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[16]  Tillman Weyde,et al.  Singing Voice Separation with Deep U-Net Convolutional Networks , 2017, ISMIR.

[17]  Gerhard Widmer,et al.  Towards Interpretable Polyphonic Transcription with Invertible Neural Networks , 2019, ISMIR.

[18]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[19]  Bob L. Sturm,et al.  “What are You Listening to?” Explaining Predictions of Deep Machine Listening Systems , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[20]  Bob L. Sturm,et al.  Local Interpretable Model-Agnostic Explanations for Music Content Analysis , 2017, ISMIR.

[21]  Jonathan Le Roux,et al.  Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[22]  Minz Won,et al.  Visualizing and Understanding Self-attention based Music Tagging , 2019, ArXiv.

[23]  Bob L. Sturm A Simple Method to Determine if a Music Information Retrieval System is a “Horse” , 2014, IEEE Transactions on Multimedia.

[24]  Bryan Pardo,et al.  Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).