Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features Via Acoustic Domain Adaptation

Emotion and expressivity in music have been topics of considerable interest in the field of music information retrieval. In recent years, mid-level perceptual features have been suggested as means to explain computational predictions of musical emotion. We find that the diversity of musical styles and genres in the available dataset for learning these features is not sufficient for models to generalise well to specialised acoustic domains such as solo piano music. In this work, we show that by utilising unsupervised domain adaptation together with receptive-field regularised deep neural networks, it is possible to significantly improve generalisation to this domain. Additionally, we demonstrate that our domain-adapted models can better predict and explain expressive qualities in classical piano performances, as perceived and described by human listeners.

[1]  Ryo Masumura,et al.  Domain adaptation of DNN acoustic models using knowledge distillation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Mohammad Soleymani,et al.  A Data-driven Approach to Mid-level Perceptual Musical Feature Modeling , 2018, ISMIR.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Douglas Eck,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  T. Eerola,et al.  A comparison of the discrete and dimensional models of emotion in music , 2011 .

[7]  Gerhard Widmer,et al.  Towards Explainable Music Emotion Recognition: The Route via Mid-level Features , 2019, ISMIR.

[8]  Gerhard Widmer,et al.  Two-level Explanations in Music Emotion Recognition , 2019, ArXiv.

[9]  Gerhard Widmer,et al.  The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[10]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[11]  Alexei A. Efros,et al.  Unsupervised Domain Adaptation through Self-Supervision , 2019, ArXiv.

[12]  Gerhard Widmer,et al.  On the Characterization of Expressive Performance in Classical Music: First Results of the Con Espressione Game , 2020, ArXiv.

[13]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[14]  Gerhard Widmer,et al.  Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs , 2019, MediaEval.

[15]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[16]  Gerhard Widmer,et al.  Computational Models of Expressive Music Performance: A Comprehensive and Critical Review , 2018, Front. Digit. Humanit..