Research on Multi-modal Music Emotion Classification Based on Audio and Lyirc

To solve the problem of low accuracy of emotion classification, this paper proposes a new multi-modal fusion emotion classification method based on audio and lyrics. Firstly, Mel Frequency Cepstrum Coefficient, spectrum centroid and frequency-band energy distribution are used as feature data in audio, and LSTM in deep learning is applied to music emotion classification; In terms of lyrics, the Bert model is used to classify the lyrics, and the sentiment dictionary is used to perform LFSM-based equalization on the lyrics emotion classification results. Finally, a new fusion method is proposed on the traditional fusion method. The experimental results show that the new fusion method has 5.77% and 4.03% improvement over the linear weighted multimodal fusion and LFSM fusion methods.

[1]  Roddy Cowie,et al.  Towards a neural based theory of emotional dispositions , 1999 .

[2]  Michael O'Neill,et al.  The Use of Mel-frequency Cepstral Coefficients in Musical Instrument Identification , 2008, ICMC.

[3]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[4]  Hamed Zamani,et al.  Current challenges and visions in music recommender systems research , 2017, International Journal of Multimedia Information Retrieval.

[5]  Christopher Ariza,et al.  Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data , 2010, ISMIR.

[6]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  J. Stephen Downie,et al.  Improving mood classification in music digital libraries by combining lyrics and audio , 2010, JCDL '10.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Feng Su,et al.  Multimodal Music Mood Classification by Fusion of Audio and Lyrics , 2015, MMM.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  C. Krumhansl An exploratory study of musical emotions and psychophysiology. , 1997, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.