4MuLA: A Multitask, Multimodal, and Multilingual Dataset of Music Lyrics and Audio Features

We present a new benchmark dataset of songs with structured information to be applied in various machine learning tasks. The data comes from a platform focused on lyrics information, but contain several other annotations provided by their users. Our dataset, called 4MuLA (Multitask, Multimodal, and Multilingual Music Lyrics and Audio features dataset), includes features extracted from 96,458 songs distributed by 15,310 artists in 76 genres. In particular, our dataset contains latin music genres that are often overlooked in other benchmark datasets. For each track, we make available various acoustic features, extracted tags, and lyrics in English, Portuguese, or Spanish. With these features, researchers can use our dataset for, at least, lyrics-, audio- or multimodal-based genre classification, music and artist similarity, and popularity regression. Moreover, we can perform cross- or multilingual text analysis on lyrics, such as discourse analysis or measuring the differences between emotion transmitted by audio and lyrics.

[1]  Nazareno Andrade,et al.  A Dataset of Rhythmic Pattern Reproductions and Baseline Automatic Assessment System , 2019, ISMIR.

[2]  Tim Crawford,et al.  JosquIntab: A Dataset for Content-based Computational Analysis of Music in Lute Tablature , 2019, ISMIR.

[3]  Leon Hong,et al.  Approachable Music Composition with Machine Learning at Scale , 2019, ISMIR.

[4]  Juhan Nam,et al.  VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano Performance , 2019, ISMIR.

[5]  Juliana Martins de Assis,et al.  Estimation of Transfer Entropy between Discrete and Continuous Random Processes , 2018 .

[6]  Xavier Serra,et al.  Da-TACOS: A Dataset for Cover Song Identification and Understanding , 2019, ISMIR.

[7]  Alessandro L. Koerich,et al.  The Latin Music Database , 2008, ISMIR.

[8]  Katharina Morik,et al.  A Benchmark Dataset for Audio Classification and Clustering , 2005, ISMIR.

[9]  Julián Urbano,et al.  The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale , 2019, ISMIR.

[10]  Tom Collins,et al.  Algorithmic Ability to Predict the Musical Future: Datasets and Evaluation , 2019, ISMIR.

[11]  Simone Diniz Junqueira Barbosa,et al.  Brazilian Lyrics-Based Music Genre Classification Using a BLSTM Network , 2020, ICAISC.

[12]  Eduardo Simas-Filho,et al.  Genre Classification for Brazilian Music using Independent and Discriminant Features , 2018 .

[13]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[14]  Craig Stuart Sapp,et al.  SUPRA: Digitizing the Stanford University Piano Roll Archive , 2019, ISMIR.

[15]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[16]  Xavier Bresson,et al.  FMA: A Dataset for Music Analysis , 2016, ISMIR.

[17]  Xavier Serra,et al.  musicnn: Pre-trained convolutional neural networks for music audio tagging , 2019, ArXiv.

[18]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[19]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .

[20]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[21]  Matthew E. P. Davies,et al.  The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music , 2019, ISMIR.

[22]  Eanes Torres Pereira,et al.  A robust music genre classification approach for global and regional music datasets evaluation , 2016, 2016 IEEE International Conference on Digital Signal Processing (DSP).

[23]  Luiz W. P. Biscainho,et al.  SAMBASET: A Dataset of Historical Samba de Enredo Recordings for Computational Music Analysis , 2019, ISMIR.

[24]  Kyunghyun Cho,et al.  Deep Unsupervised Drum Transcription , 2019, ISMIR.

[25]  Xavier Serra,et al.  The MTG-Jamendo Dataset for Automatic Music Tagging , 2019, ICML 2019.

[26]  Satoru Fukayama,et al.  AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing , 2019, ISMIR.