Modeling and Predicting Emotion in Music

Human emotion responses to music are dynamic processes that evolve naturally over time in synchrony with the observed music signal. It is because of this dynamic nature that systems that seek to predict emotion in music must necessarily analyze such processes on short-time intervals, modeling not just the relationships between acoustic data and emotion parameters, but also how those relationships evolve over time. In this work, we discuss modeling such relationships using a conditional random field (CRF), a powerful graphical model that is trained to predict the conditional probability p(y|x) for a sequence of labels y given a sequence of features x. We train our graphical model on the emotional responses of individual annotators in an 11×11 quantized representation of the arousal-valence (A-V) space. Our model is fully connected and can produce estimates of the conditional probability for each A-V bin, allowing us to easily model complex emotion-space distributions (e.g. multimodal) as an A-V heatmap. In selecting acoustic features for music emotion recognition, we discuss the application of regressionbased deep belief networks (DBNs) to learn features directly from magnitude spectra. These features are specifically optimized for the prediction of emotion, and the trained models can potentially provide new insight into the relationships between music and emotion.

[1]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  R. Thayer The biopsychology of mood and arousal , 1989 .

[4]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[5]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[8]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[9]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[10]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[13]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[14]  Bernhard Schölkopf,et al.  Kernel Methods and Support Vector Machines , 2003 .

[15]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[16]  Andrew McCallum,et al.  Confidence Estimation for Information Extraction , 2004, NAACL.

[17]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[18]  Daniel P. W. Ellis,et al.  Support vector machine active learning for music retrieval , 2006, Multimedia Systems.

[19]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[20]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Luis von Ahn Games with a Purpose , 2006, Computer.

[23]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[24]  Durga L. Shrestha,et al.  Experiments with AdaBoost.RT, an Improved Boosting Scheme for Regression , 2006, Neural Computation.

[25]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[27]  K. MacDorman,et al.  Automatic Emotion Prediction of Song Excerpts: Index Construction, Algorithm Design, and Empirical Comparison , 2007 .

[28]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[29]  Gert R. G. Lanckriet,et al.  A Game-Based Approach for Collecting Semantic Annotations of Music , 2007, ISMIR.

[30]  George Tzanetakis,et al.  MARSYAS SUBMISSIONS TO MIREX 2007 , 2007 .

[31]  Roger B. Dannenberg,et al.  TagATune: A Game for Music and Sound Annotation , 2007, ISMIR.

[32]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[33]  Byron Boots,et al.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems , 2007, NIPS.

[34]  Densil Cabrera,et al.  'Psysound3': Software for Acoustical and Psychoacoustical Analysis of Sound Recordings , 2007 .

[35]  Janto Skowronek,et al.  A Demonstrator for Automatic Music Mood Estimation , 2007, ISMIR.

[36]  Daniel P. W. Ellis,et al.  Please Scroll down for Article Journal of New Music Research a Web-based Game for Collecting Music Metadata a Web-based Game for Collecting Music Metadata , 2022 .

[37]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[38]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[39]  G. Peeters,et al.  A Generic Training and Classification System for MIREX08 Classification Tasks: Audio Music Mood, Audio Genre, Audio Artist and Audio Tag , 2008 .

[40]  Youngmoo E. Kim,et al.  MoodSwings: A Collaborative Game for Music Mood Label Collection , 2008, ISMIR.

[41]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[42]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Seungmin Rho,et al.  SMERS: Music Emotion Recognition Using Support Vector Regression , 2009, ISMIR.

[44]  Ming Li,et al.  THINKIT'S SUBMISSIONS FOR MIREX2009 AUDIO MUSIC CLASSIFICATION AND SIMILARITY TASKS , 2009 .

[45]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[46]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Gert R. G. Lanckriet,et al.  User-centered design of a social game to tag music , 2009, HCOMP '09.

[48]  Douglas Eck,et al.  Learning Tags that Vary Within a Song , 2010, ISMIR.

[49]  Brandon G. Morton,et al.  Improving music emotion labeling using human computation , 2010, HCOMP '10.

[50]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[51]  Youngmoo E. Kim,et al.  Feature selection for content-based, time-varying musical emotion regression , 2010, MIR '10.

[52]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[53]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[54]  Hsin-Min Wang,et al.  Audio Classification Using Semantic Transformation and Classifier Ensemble , 2010 .

[55]  Jin Ha Lee,et al.  Crowdsourcing Music Similarity Judgments using Mechanical Turk , 2010, ISMIR.

[56]  Youngmoo E. Kim,et al.  Prediction of Time-varying Musical Mood Distributions from Audio , 2010, ISMIR.

[57]  Youngmoo E. Kim,et al.  Modeling Musical Emotion Dynamics with Conditional Random Fields , 2011, ISMIR.

[58]  Gert R. G. Lanckriet,et al.  Modeling Dynamic Patterns for Emotional Content in Music , 2011, ISMIR.

[59]  Brandon G. Morton,et al.  A Comparative Study of Collaborative vs. Traditional Musical Mood Annotation , 2011, ISMIR.

[60]  Youngmoo E. Kim,et al.  Learning emotion-based acoustic features with deep belief networks , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[61]  J. Madsen,et al.  Modeling expressed emotions in music using pairwise comparisons , 2012 .

[62]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.