Mood Classification Using Listening Data

The mood of a song is a highly relevant feature for exploration and recommendation in large collections of music. These collections tend to require automatic methods for predicting such moods. In this work, we show that listening-based features outperform content-based ones when classifying moods: embeddings obtained through matrix factorization of listening data appear to be more informative of a track mood than embeddings based on its audio content. To demonstrate this, we compile a subset of the Million Song Dataset, totalling 67k tracks, with expert annotations of 188 different moods collected from AllMusic. Our results on this novel dataset not only expose the limitations of current audio-based models, but also aim to foster further reproducible research on this timely topic.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Yi-Hsuan Yang,et al.  Modeling the Affective Content of Music with a Gaussian Mixture Model , 2015, IEEE Transactions on Affective Computing.

[4]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[5]  Hiromu Yakura,et al.  FocusMusicRecommender: A System for Recommending Music to Listen to While Working , 2018, IUI.

[6]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Yi-Hsuan Yang,et al.  1000 songs for emotional analysis of music , 2013, CrowdMM '13.

[8]  Hamed Zamani,et al.  Current challenges and visions in music recommender systems research , 2017, International Journal of Multimedia Information Retrieval.

[9]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[10]  Michael I. Mandel,et al.  Evaluation of Algorithms Using Games: The Case of Music Tagging , 2009, ISMIR.

[11]  Romain Hennequin,et al.  Music Mood Detection Based on Audio and Lyrics with Deep Neural Net , 2018, ISMIR.

[12]  Iván Cantador,et al.  Alleviating the new user problem in collaborative filtering by exploiting personality information , 2016, User Modeling and User-Adapted Interaction.

[13]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[14]  Feng Niu,et al.  Million Song Dataset Challenge ! , 2012 .

[15]  Youngmoo E. Kim,et al.  Modeling Musical Emotion Dynamics with Conditional Random Fields , 2011, ISMIR.

[16]  Remco C. Veltkamp,et al.  Emotion Based Segmentation of Musical Audio , 2015, ISMIR.

[17]  J. Stephen Downie,et al.  Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata , 2007, ISMIR.

[18]  P. Costa,et al.  Revised NEO Personality Inventory (NEO-PI-R) and NEO-Five-Factor Inventory (NEO-FFI) , 1992 .

[19]  Juhan Nam,et al.  Zero-shot Learning for Audio-based Music Classification and Tagging , 2019, ISMIR.

[20]  Guandong Xu,et al.  Exploring user emotion in microblogs for music recommendation , 2015, Expert Syst. Appl..

[21]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[22]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[23]  Wolfgang Nejdl,et al.  Music Mood and Theme Classification - a Hybrid Approach , 2009, ISMIR.

[24]  Bruce Ferwerda,et al.  Personality Traits and Music Genres: What Do People Prefer to Listen To? , 2017, UMAP.

[25]  J. Stephen Downie,et al.  When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis , 2010, ISMIR.

[26]  S. Gosling,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES The Do Re Mi’s of Everyday Life: The Structure and Personality Correlates of Music Preferences , 2003 .

[27]  Yi-Hsuan Yang,et al.  Leveraging Affective Hashtags for Ranking Music Recommendations , 2018, IEEE Transactions on Affective Computing.

[28]  Maurizio Morisio,et al.  Music Mood Dataset Creation Based on Last.fm Tags , 2017 .

[29]  Xavier Serra,et al.  End-to-end Learning for Music Audio Tagging at Scale , 2017, ISMIR.

[30]  Gerhard Widmer,et al.  Towards Explainable Music Emotion Recognition: The Route via Mid-level Features , 2019, ISMIR.

[31]  J. Russell A circumplex model of affect. , 1980 .

[32]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Brandon G. Morton,et al.  Relating Perceptual and Feature Space Invariances in Music Emotion Recognition , 2012 .

[35]  Erik M. Schmidt,et al.  Learning Rhythm And Melody Features With Deep Belief Networks , 2013, ISMIR.

[36]  Rui Pedro Paiva,et al.  Classification and Regression of Music Lyrics: Emotionally-Significant Features , 2016, KDIR.

[37]  Markus Schedl,et al.  On the Interrelation Between Listener Characteristics and the Perception of Emotions in Classical Orchestra Music , 2018, IEEE Transactions on Affective Computing.

[38]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[39]  J. Stephen Downie,et al.  Improving mood classification in music digital libraries by combining lyrics and audio , 2010, JCDL '10.

[40]  Xavier Serra,et al.  Evaluation of CNN-based Automatic Music Tagging Models , 2020, ArXiv.

[41]  Xavier Serra,et al.  musicnn: Pre-trained convolutional neural networks for music audio tagging , 2019, ArXiv.