GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

Symbolic music datasets are important for music information retrieval and musical analysis. However, there is a lack of large-scale symbolic dataset for classical piano music. In this article, we create a GiantMIDI-Piano dataset containing 10,854 unique piano solo pieces composed by 2,786 composers. The dataset is collected as follows, we extract music piece names and composer names from the International Music Score Library Project (IMSLP). We search and download their corresponding audio recordings from the internet. We apply a convolutional neural network to detect piano solo pieces. Then, we transcribe those piano solo recordings to Musical Instrument Digital Interface (MIDI) files using our recently proposed high-resolution piano transcription system. Each transcribed MIDI file contains onset, offset, pitch and velocity attributes of piano notes, and onset and offset attributes of sustain pedals. GiantMIDI-Piano contains 34,504,873 transcribed notes, and contains metadata information of each music piece. To our knowledge, GiantMIDI-Piano is the largest classical piano MIDI dataset so far. We analyses the statistics of GiantMIDI-Piano including the nationalities, the number and duration of works of composers. We show the chroma, interval, trichord and tetrachord frequencies of six composers from different eras to show that GiantMIDI-Piano can be used for musical analysis. Our piano solo detection system achieves an accuracy of 89\%, and the piano note transcription achieves an onset F1 of 96.72\% evaluated on the MAESTRO dataset. GiantMIDI-Piano achieves an alignment error rate (ER) of 0.154 to the manually input MIDI files, comparing to MAESTRO with an alignment ER of 0.061 to the manually input MIDI files. We release the source code of acquiring the GiantMIDI-Piano dataset at this https URL.

[1]  A. Forte The Structure of Atonal Music , 1973 .

[2]  Dave Smith,et al.  The 'USI', or Universal Synthesizer Interface , 1981 .

[3]  Ian H. Witten,et al.  Multiple viewpoint systems for music prediction , 1995 .

[4]  B. Repp The Art of Inaccuracy: Why Pianists' Errors Are Difficult to Hear , 1996 .

[5]  Michael Good MusicXML: An internet-friendly format for sheet music , 2001 .

[6]  Timothy C. Bell,et al.  The Challenge of Optical Music Recognition , 2001, Comput. Humanit..

[7]  Perry Roland,et al.  The Music Encoding Initiative ( MEI ) , 2002 .

[8]  Han-Wen Nienhuys,et al.  LILYPOND, A SYSTEM FOR AUTOMATED MUSIC ENGRAVING , 2003 .

[9]  Craig Stuart Sapp Online Database of Scores in the Humdrum File Format , 2005, ISMIR.

[10]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[11]  Haruhiro Katayose,et al.  A New Music Database Describing Deviation Information of Performance Expressions , 2008, ISMIR.

[12]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Bertrand David,et al.  MAPS - A piano database for multipitch estimation and automatic transcription of music , 2010 .

[14]  Josef Bušta,et al.  International Music Score Library Project , 2011 .

[15]  Frans Wiering,et al.  Unfolding the potential of computational musicology , 2011, ICISO 2011.

[16]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[17]  Carlos Guedes,et al.  Optical music recognition: state-of-the-art and open issues , 2012, International Journal of Multimedia Information Retrieval.

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  David Meredith Computational Music Analysis , 2016, Springer International Publishing.

[20]  Annamaria Mesaros,et al.  Metrics for Polyphonic Sound Event Detection , 2016 .

[21]  Colin Raffel,et al.  Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching , 2016 .

[22]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[23]  Eita Nakamura,et al.  Performance Error Detection and Post-Processing for Fast and Accurate Symbolic Music Alignment , 2017, ISMIR.

[24]  György Fazekas,et al.  A Tutorial on Deep Learning for Music Information Retrieval , 2017, ArXiv.

[25]  Colin Raffel,et al.  Onsets and Frames: Dual-Objective Piano Transcription , 2017, ISMIR.

[26]  Gerhard Widmer,et al.  Computational Models of Expressive Music Performance: A Comprehensive and Critical Review , 2018, Front. Digit. Humanit..

[27]  Gaurav Sharma,et al.  Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications , 2016, IEEE Transactions on Multimedia.

[28]  Craig Stuart Sapp,et al.  SUPRA: Digitizing the Stanford University Piano Roll Archive , 2019, ISMIR.

[29]  Douglas Eck,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[30]  Yong Xu,et al.  Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems , 2019, ArXiv.

[31]  Juan Pablo Bello,et al.  Adversarial Learning for Improved Onsets and Frames Music Transcription , 2019, ISMIR.

[32]  Leon Hong,et al.  Approachable Music Composition with Machine Learning at Scale , 2019, ISMIR.

[33]  Andrew M. Dai,et al.  Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.

[34]  Juhan Nam,et al.  Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model , 2020, ISMIR.

[35]  Mark D. Plumbley,et al.  PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  Florent Jacquemard,et al.  ASAP: a dataset of aligned scores and performances for piano transcription , 2020, ISMIR.

[37]  Qiuqiang Kong,et al.  High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[38]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .