A Multimodal Music Emotion Classification Method Based on Multifeature Combined Network Classifier

Aiming at the shortcomings of single network classification model, this paper applies CNN-LSTM (convolutional neural networks-long short-term memory) combined network in the field of music emotion classification and proposes a multifeature combined network classifier based on CNN-LSTM which combines 2D (two-dimensional) feature input through CNN-LSTM and 1D (single-dimensional) feature input through DNN (deep neural networks) to make up for the deficiencies of original single feature models. The model uses multiple convolution kernels in CNN for 2D feature extraction, BiLSTM (bidirectional LSTM) for serialization processing and is used, respectively, for audio and lyrics single-modal emotion classification output. In the audio feature extraction, music audio is finely divided and the human voice is separated to obtain pure background sound clips; the spectrogram and LLDs (Low Level Descriptors) are extracted therefrom. In the lyrics feature extraction, the chi-squared test vector and word embedding extracted by Word2vec are, respectively, used as the feature representation of the lyrics. Combining the two types of heterogeneous features selected by audio and lyrics through the classification model can improve the classification performance. In order to fuse the emotional information of the two modals of music audio and lyrics, this paper proposes a multimodal ensemble learning method based on stacking, which is different from existing feature-level and decision-level fusion methods, the method avoids information loss caused by direct dimensionality reduction, and the original features are converted into label results for fusion, effectively solving the problem of feature heterogeneity. Experiments on million song dataset show that the audio classification accuracy of the multifeature combined network classifier in this paper reaches 68%, and the lyrics classification accuracy reaches 74%. The average classification accuracy of the multimodal reaches 78%, which is significantly improved compared with the single-modal.

[1]  Pau-Choo Chung,et al.  Detecting emotional expression of music with feature selection approach , 2013, 2013 1st International Conference on Orange Technologies (ICOT).

[2]  Yilin Yang,et al.  Deep learning based mood tagging for Chinese song lyrics , 2019, ArXiv.

[3]  Lei Wang,et al.  Convolutional Recurrent Neural Networks for Text Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[4]  H ChenHomer,et al.  Machine Recognition of Music Emotion , 2012 .

[5]  Ye Xu,et al.  Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods , 2015, Multimedia Systems.

[6]  K. Priya,et al.  Improvised emotion and genre detection for songs through signal processing and genetic algorithm , 2018, Concurr. Comput. Pract. Exp..

[7]  Radhika Mamidi,et al.  Addition of Code Mixed Features to Enhance the Sentiment Prediction of Song Lyrics , 2018, ArXiv.

[8]  Shujuan Wang,et al.  Naive Bayes classifiers for music emotion classification based on lyrics , 2017, 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS).

[9]  Pascale Fung,et al.  Multimodal music emotion classification using AdaBoost with decision stumps , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Yi-Hsuan Yang,et al.  Machine Recognition of Music Emotion: A Review , 2012, TIST.

[11]  Gang Liu,et al.  Bidirectional LSTM with attention mechanism and convolutional layer for text classification , 2019, Neurocomputing.

[12]  Bini Omman,et al.  Evaluation of Features on Sentimental Analysis , 2015 .

[13]  Jia-Ching Wang,et al.  Music emotion recognition using deep Gaussian process , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[14]  Hui He,et al.  Emotion Recognition of Pop Music Based on Maximum Entropy with Priors , 2009, PAKDD.

[15]  Carlo Strapparava,et al.  Lyrics, Music, and Emotions , 2012, EMNLP.

[16]  Feng Su,et al.  Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space , 2017, MMM.

[17]  Shuang Feng,et al.  Research on Music Emotion Classification Based on Lyrics and Audio , 2018, 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC).

[19]  Yi-Hsuan Yang,et al.  Exploiting online music tags for music emotion classification , 2011, TOMCCAP.

[20]  Tiffany Ya Tang,et al.  Combining Content and Sentiment Analysis on Lyrics for a Lightweight Emotion-Aware Chinese Song Recommendation System , 2018, ICMLC.

[21]  John Kim,et al.  Emotion Recognition from Human Speech Using Temporal Information and Deep Learning , 2018, INTERSPEECH.

[22]  Naoyuki Kubota,et al.  Fuzzy Semantic Agent Based on Ontology Model for Chinese Lyrics Classification , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[23]  Chastine Fatichah,et al.  Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion , 2018, International Journal of Electrical and Computer Engineering (IJECE).

[24]  Limin Zheng,et al.  A Hybrid Bidirectional Recurrent Convolutional Neural Network Attention-Based Model for Text Classification , 2019, IEEE Access.

[25]  Paavo Alku,et al.  Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task , 2019, Comput. Speech Lang..

[26]  Wei Zhao,et al.  Recurrent Neural Network for MIDI Music Emotion Classification , 2018, 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC).

[27]  Ron Hoory,et al.  Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms , 2017, INTERSPEECH.

[28]  R. Paiva,et al.  Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis , 2013 .

[29]  Jun-Ho Huh,et al.  Automatic Emotion-Based Music Classification for Supporting Intelligent IoT Applications , 2019, Electronics.

[30]  Mahesh Chandra Govil,et al.  A comparative analysis of SVM and its stacking with other classification algorithm for intrusion detection , 2016, 2016 International Conference on Advances in Computing, Communication, & Automation (ICACCA) (Spring).