The acoustic emotion gaussians model for emotion-based music annotation and retrieval

One of the most exciting but challenging endeavors in music research is to develop a computational model that comprehends the affective content of music signals and organizes a music collection according to emotion. In this paper, we propose a novel acoustic emotion Gaussians (AEG) model that defines a proper generative process of emotion perception in music. As a generative model, AEG permits easy and straightforward interpretations of the model learning processes. To bridge the acoustic feature space and music emotion space, a set of latent feature classes, which are learned from data, is introduced to perform the end-to-end semantic mappings between the two spaces. Based on the space of latent feature classes, the AEG model is applicable to both automatic music emotion annotation and emotion-based music retrieval. To gain insights into the AEG model, we also provide illustrations of the model learning process. A comprehensive performance study is conducted to demonstrate the superior accuracy of AEG over its predecessors, using two emotion annotated music corpora MER60 and MTurk. Our results show that the AEG model outperforms the state-of-the-art methods in automatic music emotion annotation. Moreover, for the first time a quantitative evaluation of emotion-based music retrieval is reported.

[1]  Emery Schubert Modeling Perceived Emotion With Continuous Musical Features , 2004 .

[2]  Youngmoo E. Kim,et al.  Modeling Musical Emotion Dynamics with Conditional Random Fields , 2011, ISMIR.

[3]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[7]  M. Zentner,et al.  Mapping Aesthetic Musical Emotions in the Brain , 2011, Cerebral cortex.

[8]  Brandon G. Morton,et al.  A Comparative Study of Collaborative vs. Traditional Musical Mood Annotation , 2011, ISMIR.

[9]  Youngmoo E. Kim,et al.  Prediction of Time-varying Musical Mood Distributions from Audio , 2010, ISMIR.

[10]  Hsin-Min Wang,et al.  Learning the Similarity of Audio Music in Bag-of-frames Representation from Tagged Music Data , 2011, ISMIR.

[11]  J. Russell A circumplex model of affect. , 1980 .

[12]  M.D. Korhonen,et al.  Modeling emotional content of music using system identification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Björn Schuller,et al.  ‘Mister D.J., Cheer Me Up!’: Musical and Textual Features for Automatic Mood Classification , 2010 .

[14]  Youngmoo E. Kim,et al.  Prediction of Time-Varying Musical Mood Distributions Using Kalman Filtering , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[15]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[16]  Homer H. Chen,et al.  Music Emotion Recognition , 2011 .

[17]  K. MacDorman,et al.  Automatic Emotion Prediction of Song Excerpts: Index Construction, Algorithm Design, and Empirical Comparison , 2007 .

[18]  P. Laukka,et al.  Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening , 2004 .

[19]  A. Gabrielsson Emotion perceived and emotion felt: Same or different? , 2001 .

[20]  Mert Bay,et al.  The 2007 MIREX Audio Mood Classification Task: Lessons Learned , 2008, ISMIR.

[21]  Yi-Hsuan Yang,et al.  Prediction of the Distribution of Perceived Music Emotions Using Discrete Samples , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  O. Lartillot,et al.  A MATLAB TOOLBOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO , 2007 .

[23]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  P. Gomez,et al.  Relationships between musical structure and psychophysiological measures of emotion. , 2007, Emotion.

[26]  M. Thaut,et al.  The Oxford handbook of music psychology , 2011 .

[27]  Petri Toiviainen,et al.  Prediction of Multidimensional Emotional Ratings in Music from Audio Using Multivariate Regression Models , 2009, ISMIR.

[28]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[29]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[30]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  David W. McDonald,et al.  The organization of home media , 2011, TCHI.

[32]  Rafael A. Calvo,et al.  Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[33]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.