A Middle-Level Learning Feature Interaction Method with Deep Learning for Multi-Feature Music Genre Classification

Nowadays, music genre classification is becoming an interesting area and attracting lots of research attention. Multi-feature model is acknowledged as a desirable technology to realize the classification. However, the major branches of multi-feature models used in most existed works are relatively independent and not interactive, which will result in insufficient learning features for music genre classification. In view of this, we exploit the impact of learning feature interaction among different branches and layers on the final classification results in a multi-feature model. Then, a middle-level learning feature interaction method based on deep learning is proposed correspondingly. Our experimental results show that the designed method can significantly improve the accuracy of music genre classification. The best classification accuracy on the GTZAN dataset can reach 93.65%, which is superior to most current methods.

[1]  Jinliang Liu,et al.  An event‐triggered approach to security control for networked systems using hybrid attack model , 2021, International Journal of Robust and Nonlinear Control.

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Changshui Zhang,et al.  Content-Based Information Fusion for Semi-Supervised Music Genre Classification , 2008, IEEE Transactions on Multimedia.

[4]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[5]  Loris Nanni,et al.  Combining visual and acoustic features for audio classification tasks , 2017, Pattern Recognit. Lett..

[6]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[7]  Liqiang Nie,et al.  Bridge the semantic gap between pop music acoustic feature and emotion: Build an interpretable model , 2016, Neurocomputing.

[8]  Dong Yue,et al.  Security Control for T–S Fuzzy Systems With Adaptive Event-Triggered Mechanism and Multiple Cyber-Attacks , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[9]  Nima Mesgarani,et al.  Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[11]  Geoffroy Peeters,et al.  Local Key Estimation From an Audio Signal Relying on Harmonic and Metrical Structures , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Kejun Zhang,et al.  Web music emotion recognition based on higher effective gene expression programming , 2013, Neurocomputing.

[13]  Yiping Duan,et al.  Deep Coupled Feedback Network for Joint Exposure Fusion and Image Super-Resolution , 2021, IEEE Transactions on Image Processing.

[14]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Michael Unser,et al.  Deep Convolutional Neural Network for Inverse Problems in Imaging , 2016, IEEE Transactions on Image Processing.

[16]  Yandre M. G. Costa,et al.  Language Identification Using Spectrogram Texture , 2015, CIARP.

[17]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[18]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Shenglan Liu,et al.  Bottom-up broadcast neural network for music genre classification , 2019, Multimedia Tools and Applications.

[20]  Jinde Cao,et al.  Quantized control for a class of neural networks with adaptive event‐triggered scheme and complex cyber‐attacks , 2021, International Journal of Robust and Nonlinear Control.

[21]  Kun-Ming Yu,et al.  Automatic Music Genre Classification Based on Modulation Spectral Analysis of Spectral and Cepstral Features , 2009, IEEE Transactions on Multimedia.

[22]  Browne,et al.  Cross-Validation Methods. , 2000, Journal of mathematical psychology.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Nina Dethlefs,et al.  A divide-and-conquer approach to neural natural language generation from structured data , 2021, Neurocomputing.

[25]  Seok-Pil Lee,et al.  Music-genre classification system based on spectro-temporal features and feature selection , 2012, IEEE Transactions on Consumer Electronics.

[26]  Yanmin Qian,et al.  Audio-Visual Deep Neural Network for Robust Person Verification , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Jyh-Shing Roger Jang,et al.  Combining Acoustic and Multilevel Visual Features for Music Genre Classification , 2015, TOMM.

[28]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .