论文信息 - MediaEval 2019 Emotion and Theme Recognition task: A VQ-VAE Based Approach

MediaEval 2019 Emotion and Theme Recognition task: A VQ-VAE Based Approach

In this paper, we, Taiinn (Taiwan) team, use pre-trained VQ-VAE as a feature extractor and compare two types of classifier for audiobased emotion and theme recognition. The VQ-VAE is pre-trained on the Million Song Dataset (MSD). We found better performance in ROC-AUC by fixing the pre-trained parameters of VQ-VAE while training the classifier. In addition, an embedding with bigger shape works better than the one-dimensional counterpart. The code and submitted models can be found at: https://github.com/annahung31/ moodtheme-tagging.

[1] Yi-Hsuan Yang,et al. Deep Learning for Audio-Based Music Classification and Tagging: Teaching Computers to Distinguish Rock from Bach , 2019, IEEE Signal Processing Magazine.

[2] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.

[3] Michael I. Mandel,et al. Evaluation of Algorithms Using Games: The Case of Music Tagging , 2009, ISMIR.

[4] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[5] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.

[6] Minz Won,et al. MediaEval 2019: Emotion and Theme Recognition in Music Using Jamendo , 2019, MediaEval.

[7] Xavier Serra,et al. End-to-end Learning for Music Audio Tagging at Scale , 2017, ISMIR.