A Study on Cross-cultural and Cross-dataset Generalizability of Music Mood Regression Models

The goal of music mood regression is to represent the emotional expression of music pieces as numerical values in a low-dimensional mood space and automatically predict those values for unseen music pieces. Existing studies on this topic usually train and test regression models using music datasets sampled from the same culture source, annotated by people with the same cultural background, or otherwise constructed by the same method. In this study, we explore whether and to what extent regression models trained with samples in one dataset can be applied to predicting valence and arousal values of samples in another dataset. Specifically, three datasets that differ in factors such as cultural backgrounds of stimuli (music) and subjects (annotators), stimulus types and annotation methods are evaluated and the results suggested that cross-cultural and cross-dataset predictions of both valence and arousal values could achieve comparable performance to within-dataset predictions. We also discuss how the generalizability of regression models can be affected by dataset characteristics. Findings of this study may provide valuable insights into music mood regression for nonWestern and other music where training data are scarce.

[1]  A. Gabrielsson,et al.  The role of structure in the musical expression of emotions , 2010 .

[2]  György Fazekas,et al.  Multidisciplinary Perspectives on Music Emotion Recognition: Implications for Content and Context-Based Models , 2012 .

[3]  T. Eerola Are the Emotions Expressed in Music Genre-specific? An Audio-based Evaluation of Datasets Spanning Classical, Film, Pop and Mixed Genres , 2011 .

[4]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[5]  Xavier Serra A Multicultural Approach in Music Information Research , 2011, ISMIR.

[6]  Juan Pablo Bello,et al.  Automated Music Emotion Recognition: A Systematic Evaluation , 2010 .

[7]  J. Madsen,et al.  Modeling expressed emotions in music using pairwise comparisons , 2012 .

[8]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[9]  Xiao Hu,et al.  A Cross-cultural Study of Music Mood Perception between American and Chinese Listeners , 2012, ISMIR.

[10]  Don Knox,et al.  A Feature Survey for Emotion Classification of Western Popular Music , 2012 .

[11]  Björn W. Schuller,et al.  On-line continuous-time music mood regression with deep recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  杨德顺,et al.  Music Emotion Regression Based on Multi-modal Features , 2012 .

[13]  Athanasios Mouchtaris,et al.  The Role of Time in Music Emotion Recognition: Modeling Musical Emotions from Time-Varying Music Features , 2012, CMMR.

[14]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[15]  Yi-Hsuan Yang,et al.  Prediction of the Distribution of Perceived Music Emotions Using Discrete Samples , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Yi-Hsuan Yang,et al.  Cross-cultural Music Mood Classification: A Comparison on English and Chinese Songs , 2012, ISMIR.