Music genre classification using a hierarchical long short term memory (LSTM) model

This paper examines the application of Long Short Term Memory (LSTM) model in music genre classification. We explore two different approaches in the paper. (1) In the first method, we use one single LSTM to directly classify 6 different genres of music. The method is implemented and the results are shown and discussed. (2) The first approach is only good for 6 or less genres. So in the second approach, we adopt a hierarchical divide- and-conquer strategy to achieve 10 genres classification. In this approach, music is classified into strong and mild genre classes. Strong genre includes hiphop, metal, pop, rock and reggae because usually they have heavier and stronger beats. The mild class includes jazz, disco, country, classic and blues because they tend to be softer musically. We further divide the sub-classes into sub-subclasses to help with the classification. Firstly, we classify an input piece into strong or mild class. Then for each subclass, we further classify them until one of the ten final classes is identified. For the implementation, each subclass classification module is implemented using a LSTM. Our hierarchical divide-and-conquer idea is built and tested. The average classification accuracy of this approach for 10-genre classification is 50.00%, which is higher than the state-of-the-art approach that uses a single convolutional neural network. From our experimental results, we show that this hierarchical scheme improves the classification accuracy significantly.