Melodic Phrase Attention Network for Symbolic Data-based Music Genre Classification (Student Abstract)

Compared with audio data-based music genre classification, researches on symbolic data-based music are scarce. Existing methods generally utilize manually extracted features, which is very time-consuming and laborious, and use traditional classifiers for label prediction without considering specific music features. To tackle this issue, we propose the Melodic Phrase Attention Network (MPAN) for symbolic data-based music genre classification. Our model is trained in three steps: First, we adopt representation learning, instead of the traditional musical feature extraction method, to obtain a vectorized representation of the music pieces. Second, the music pieces are divided into several melodic phrases through melody segmentation. Finally, the Melodic Phrase Attention Network is designed according to music characteristics, to identify the reflection of each melodic phrase on the music genre, thereby generating more accurate predictions. Experimental results show that our proposed method is superior to baseline symbolic data-based music genre classification approaches, and has achieved significant performance improvements on two large datasets. Introduction Existing music genre classification methods require manual extraction of features for different datasets, and need to extract a large number of music features because of the complexity of music, which results in costly feature engineering. In addition, these methods typically use traditional algorithms (such as SVM) for classification, so they are not able to make full use of complex musical features. In this paper, we propose a melodic phrase attention network for symbolic data-based music genre classification to improve the classification performance. Proposed Approach Melody Segmentation Representation The melody segmentation method we use is refer as the musical energy feature vector-based segmentation(Bingjie 2014), which defines four variables,namely PitchArea, VolumeArea, MelodyArea and SoundArea. A music piece is divided into several melodic phrases using the above method. *Corresponding author. Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: Overall framework of the Melodic Phrase Attention Network. We use the alphabetic letters with an octave indication(e.g.C4 for note C in the fourth octave—middle C) to represent pitch of a note. After representing a music slice as a word, we apply a representation learning approach similar to word2vec, to get the vectorized representation: For each music slice encoded as a word dt in a corpus of size T, the model tries to predict the surrounding music slice in a window c. Melodic Phrase Attention Network Figure 1 illustrates the overall architecture of our Melodic Phrase Attention Network, which consists of a melodic phrase encoder and melodic phrase attention network. Melodic Phrase Encoder In this module, we obtain the vectorized representation cit of each music slice in the i-th melodic phrase, and the input of the encoder is represented as Xi = {ci1, ci2, · · · , cik}. We apply a CNN model as the Melodic Phrase Encoder, which is composed of a 1D convolutional layer, a max pooling layer and a fully-connected layer. We obtain vectorized representation Pi of the i-th melodic phrase through the Melodic Phrase Encoder. Melodic Phrase Attention In order to identify melodic phrase that contribute to the classification task, we apply the The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)