Bimodal Emotion Recognition Model for Minnan Songs

Most of the existing research papers study the emotion recognition of Minnan songs from the perspectives of music analysis theory and music appreciation. However, these investigations do not explore any possibility of carrying out an automatic emotion recognition of Minnan songs. In this paper, we propose a model that consists of four main modules to classify the emotion of Minnan songs by using the bimodal data—song lyrics and audio. In the proposed model, an attention-based Long Short-Term Memory (LSTM) neural network is applied to extract lyrical features, and a Convolutional Neural Network (CNN) is used to extract the audio features from the spectrum. Then, two kinds of extracted features are concatenated by multimodal compact bilinear pooling, and finally, the concatenated features are input to the classifying module to determine the song emotion. We designed three experiment groups to investigate the classifying performance of combinations of the four main parts, the comparisons of proposed model with the current approaches and the influence of a few key parameters on the performance of emotion recognition. The results show that the proposed model exhibits better performance over all other experimental groups. The accuracy, precision and recall of the proposed model exceed 0.80 in a combination of appropriate parameters.

[1]  Margaret Lech,et al.  Evaluating deep learning architectures for Speech Emotion Recognition , 2017, Neural Networks.

[2]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[5]  L. Perlovsky Cognitive function, origin, and evolution of musical emotions , 2012 .

[6]  Tao Li,et al.  Toward intelligent music information retrieval , 2006, IEEE Transactions on Multimedia.

[7]  Björn W. Schuller,et al.  Tango or Waltz?: Putting Ballroom Dance Style into Tempo Detection , 2008, EURASIP J. Audio Speech Music. Process..

[8]  K. Hevner Expression in music: a discussion of experimental studies and theories. , 1935 .

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  Kenji Kita,et al.  Transfer Learning Based on Utterance Emotion Corpus for Lyric Emotion Estimation , 2018, 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS).

[11]  S. O. Ali,et al.  Songs and emotions: are lyrics and melodies equal partners? , 2006 .

[12]  Rahul Dubey,et al.  Emotion Analysis of Songs Based on Lyrical and Audio Features , 2015, ArXiv.

[13]  Shujuan Wang,et al.  Naive Bayes classifiers for music emotion classification based on lyrics , 2017, 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS).

[14]  K. Tokunaga,et al.  The origin of Minnan and Hakka, the so-called "Taiwanese", inferred by HLA study. , 2001, Tissue antigens.

[15]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[16]  H. Clark The Fu of Minnan: A Local Clan in Late Tang and Song China (9th-13th Centuries) , 1995 .