Towards Deep Modeling of Music Semantics using EEG Regularizers

Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces. The resulting semantic spaces are theoretically limited, either because the chosen high-level tags do not cover all of music semantics or because audio data itself is not enough to determine music semantics. In this paper, we propose a generic framework for semantics modeling that focuses on the perception of the listener, through EEG data, in addition to audio data. We implement this framework using a novel end-to-end 2-view Neural Network (NN) architecture and a Deep Canonical Correlation Analysis (DCCA) loss function that forces the semantic embedding spaces of both views to be maximally correlated. We also detail how the EEG dataset was collected and use it to train our proposed model. We evaluate the learned semantic space in a transfer learning context, by using it as an audio feature extractor in an independent dataset and proxy task: music audio-lyrics cross-modal retrieval. We show that our embedding model outperforms Spotify features and performs comparably to a state-of-the-art embedding model that was trained on 700 times more data. We further discuss improvements to the model that are likely to improve its performance.

[1]  E. Altenmüller,et al.  Hits to the left, flops to the right: different emotions during listening to music are reflected in cortical lateralisation patterns , 2002, Neuropsychologia.

[2]  L. Trainor,et al.  Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions , 2001 .

[3]  Masayuki Numao,et al.  EEG-Based Emotion Recognition during Music Listening , 2014 .

[4]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[5]  J. Russell A circumplex model of affect. , 1980 .

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  T. Eerola,et al.  A comparison of the discrete and dimensional models of emotion in music , 2011 .

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Laurent Bougrain,et al.  Wavelet-based Semblance for P300 Single-trial Detection , 2013, BIOSIGNALS.

[11]  Bao-Liang Lu,et al.  EEG-Based Emotion Recognition in Listening Music by Using Support Vector Machine and Linear Dynamic System , 2012, ICONIP.

[12]  Leontios J. Hadjileontiadis,et al.  Toward an EEG-Based Recognition of Music Liking Using Time-Frequency Analysis , 2012, IEEE Transactions on Biomedical Engineering.

[13]  Hyun Seung Yang,et al.  CBVMR: Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint , 2018, ICMR.

[14]  Jianmin Wang,et al.  Collective Deep Quantization for Efficient Cross-Modal Retrieval , 2017, AAAI.

[15]  J. Sloboda,et al.  Music and Emotion , 2013 .

[16]  J. Russell,et al.  The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology , 2005, Development and Psychopathology.

[17]  Juhan Nam,et al.  Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms , 2017, ArXiv.

[18]  Lei Chen,et al.  Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[19]  Juhan Nam,et al.  Representation Learning of Music Using Artist Labels , 2018, ISMIR.

[20]  Masayuki Numao,et al.  Continuous Music-Emotion Recognition Based on Electroencephalogram , 2016, IEICE Trans. Inf. Syst..

[21]  Tijl De Bie,et al.  CCA and a Multi-way Extension for Investigating Common Components between Audio, Lyrics and Tags. , 2012 .

[22]  M. Grigutsch,et al.  Music and emotion: electrophysiological correlates of the processing of pleasant and unpleasant music. , 2007, Psychophysiology.

[23]  Mark Sandler,et al.  Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[24]  Juhan Nam,et al.  Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging , 2017, IEEE Signal Processing Letters.

[25]  Yuan-Pin Lin,et al.  EEG-Based Emotion Recognition in Music Listening , 2010, IEEE Transactions on Biomedical Engineering.

[26]  Yi-Hsuan Yang,et al.  Modeling the Affective Content of Music with a Gaussian Mixture Model , 2015, IEEE Transactions on Affective Computing.

[27]  Gerhard Widmer,et al.  Getting Closer to the Essence of Music , 2016, ACM Trans. Intell. Syst. Technol..

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  R. Thayer The biopsychology of mood and arousal , 1989 .

[30]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[31]  Ian Daly,et al.  Neural correlates of emotional responses to music: An EEG study , 2014, Neuroscience Letters.

[32]  Homer H. Chen,et al.  Music Emotion Recognition , 2011 .

[33]  Wei Jiang,et al.  Latent topic model for audio retrieval , 2014, Pattern Recognit..