Bridge the semantic gap between pop music acoustic feature and emotion: Build an interpretable model

Music emotion recognition (MER) is an important topic in music understanding, recommendation, retrieval and human computer interaction. Great success has been achieved by machine learning methods in estimating human emotional response to music. However, few of them pay much attention in semantic interpret for emotion response. In our work, we first train an interpretable model between acoustic audio and emotion. Filter, wrapper and shrinkage methods are applied to select important features. We then apply statistical models to build and explain the emotion model. Extensive experimental results reveal that the shrinkage methods outperform the wrapper methods and the filter methods in arousal emotion. In addition, we observed that only a small set of the extracted features have the key effects to arousal. While, most of our extracted features have small contribution to valence music perception. Ultimately, we obtain a higher average accuracy rate in arousal, compared to that in valence.

[1]  D. Ellis Beat Tracking by Dynamic Programming , 2007 .

[2]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[3]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[4]  David A. Forsyth,et al.  Representation Learning , 2015, Computer.

[5]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[6]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[7]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Amílcar Cardoso,et al.  A musical system for emotional expression , 2010, Knowl. Based Syst..

[9]  Jeffrey J. Scott,et al.  State of the Art Report: Music Emotion Recognition: A State of the Art Review , 2010, ISMIR.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[12]  Hui Zou An Improved 1-norm SVM for Simultaneous Classification and Variable Selection , 2007, AISTATS.

[13]  Youngmoo E. Kim,et al.  Learning emotion-based acoustic features with deep belief networks , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14]  Minoru Fukumi,et al.  Feature selection method for music mood score detection , 2011, 2011 Fourth International Conference on Modeling, Simulation and Applied Optimization.

[15]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[16]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Mee Young Park,et al.  Regularization Path Algorithms for Detecting Gene Interactions , 2006 .

[19]  Chitra Dorai Computational Media Aesthetics , 2009, Encyclopedia of Database Systems.

[20]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[21]  K. Hevner Expression in music: a discussion of experimental studies and theories. , 1935 .

[22]  Yueting Zhuang,et al.  Music information retrieval by detecting mood via computational media aesthetics , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[23]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[24]  Alan J. Miller Subset Selection in Regression , 1992 .

[25]  Yi-Hsuan Yang,et al.  Machine Recognition of Music Emotion: A Review , 2012, TIST.

[26]  Vlad I. Morariu,et al.  Expression , 2015, Principles of Molecular Virology.

[27]  Jeffrey J. Scott,et al.  Feature Learning in Dynamic Environments: Modeling the Acoustic Structure of Musical Emotion , 2012, ISMIR.

[28]  J. Russell A circumplex model of affect. , 1980 .

[29]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[30]  Xiangjian He,et al.  A three-level framework for affective content analysis and its case studies , 2014, Multimedia Tools and Applications.

[31]  Ye Xu,et al.  Some issues of mood classification for Chinese popular music , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[32]  Seungmin Rho,et al.  Music emotion classification and context-based music recommendation , 2010, Multimedia Tools and Applications.

[33]  Derek Matravers Expression in Music , 2007 .

[34]  P. Laukka,et al.  Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening , 2004 .

[35]  Yi-Hsuan Yang,et al.  1000 songs for emotional analysis of music , 2013, CrowdMM '13.

[36]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[37]  Svetha Venkatesh,et al.  Computational Media Aesthetics: Finding Meaning Beautiful , 2001, IEEE Multim..

[38]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[40]  Yi-Hsuan Yang,et al.  Personalized music emotion recognition , 2009, SIGIR.

[41]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[42]  Tuomas Eerola,et al.  Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Author Hu,et al.  Title A cross-cultural study of mood in K-POP Songs , 2014 .

[44]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[45]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[46]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[47]  R. Thayer The biopsychology of mood and arousal , 1989 .

[48]  Òscar Celma,et al.  Foafing the Music: Bridging the Semantic Gap in Music Recommendation , 2006, SEMWEB.

[49]  Xiao Hu,et al.  A Cross-cultural Study of Music Mood Perception between American and Chinese Listeners , 2012, ISMIR.

[50]  Juan Pablo Bello,et al.  Automated Music Emotion Recognition: A Systematic Evaluation , 2010 .

[51]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Youngmoo E. Kim,et al.  Feature selection for content-based, time-varying musical emotion regression , 2010, MIR '10.

[53]  Christian S. Jensen,et al.  Emotion-based music retrieval on a well-reduced audio feature space , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  O. Lartillot,et al.  A MATLAB TOOLBOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO , 2007 .

[55]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[56]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[57]  Yi-Hsuan Yang,et al.  Cross-cultural Music Mood Classification: A Comparison on English and Chinese Songs , 2012, ISMIR.

[58]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[59]  Xavier Serra,et al.  Bridging the Music Semantic Gap , 2006 .

[60]  Tao Liu,et al.  Music's Affective Computing Model Based on Fuzzy Logic , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[61]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[62]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Andreas Spanias,et al.  A review of algorithms for perceptual coding of digital audio signals , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[64]  Elias Pampalk A Matlab Toolbox to Compute Music Similarity from Audio , 2004, ISMIR.

[65]  Petri Toiviainen,et al.  Prediction of Multidimensional Emotional Ratings in Music from Audio Using Multivariate Regression Models , 2009, ISMIR.

[66]  P. Juslin,et al.  Cue Utilization in Communication of Emotion in Music Performance: Relating Performance to Perception Studies of Music Performance , 2022 .

[67]  H ChenHomer,et al.  Machine Recognition of Music Emotion , 2012 .

[68]  J. Stephen Downie,et al.  Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata , 2007, ISMIR.

[69]  Feiping Nie,et al.  Compound Rank- $k$ Projections for Bilinear Analysis , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[70]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[71]  Yi Yang,et al.  Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM , 2015, ICML.