ASAP: a dataset of aligned scores and performances for piano transcription

In this paper we present Aligned Scores and Performances (ASAP): a new dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical piano music. The scores are provided as paired MusicXML files and quantized MIDI files, and the performances as paired MIDI files and partially as audio recordings. Scores and performances are aligned with downbeat, beat, time signature, and key signature annotations. ASAP has been obtained thanks to a new annotation workflow that combines score analysis and alignment algorithms, with the goal of reducing the time for manual annotation. The dataset itself is, to our knowledge, the largest that includes an alignment of music scores to MIDI and audio performance data. As such, it is a useful resource for a wide variety of MIR applications, from those that target the complete audio-to-score Automatic Music Transcription task, to others that target more specific aspects (e.g., key signature estimation and beat or downbeat tracking from both MIDI and audio representations).

[1]  Eita Nakamura,et al.  Performance Error Detection and Post-Processing for Fast and Accurate Symbolic Music Alignment , 2017, ISMIR.

[2]  Zhiyao Duan,et al.  A Metric for Music Notation Transcription Accuracy , 2017, ISMIR.

[3]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[4]  Virginie Thion,et al.  Gioqoso, an online Quality Assessment Tool for Music Notation , 2018 .

[5]  Jan Hajic,et al.  Understanding Optical Music Recognition , 2019, ACM Comput. Surv..

[6]  Simon Dixon,et al.  Automatic Music Transcription: An Overview , 2019, IEEE Signal Processing Magazine.

[7]  Andrew McLeod,et al.  HMM-Based Voice Separation of MIDI Performance , 2016 .

[8]  Adrien Ycart,et al.  A-MAPS: Augmented MAPS Dataset with Rhythm and Key Annotations , 2018 .

[9]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[11]  Craig Stuart Sapp,et al.  SUPRA: Digitizing the Stanford University Piano Roll Archive , 2019, ISMIR.

[12]  Matthew E. P. Davies,et al.  Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other , 2019, ISMIR.

[13]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Gerhard Widmer,et al.  On the Potential of Simple Framewise Approaches to Piano Transcription , 2016, ISMIR.

[15]  Haruhiro Katayose,et al.  A New Music Database Describing Deviation Information of Performance Expressions , 2008, ISMIR.

[16]  Jorge Calvo-Zaragoza,et al.  A Holistic Approach to Polyphonic Music Transcription with Neural Networks , 2019, ISMIR.

[17]  Malcolm D. Macleod,et al.  Particle Filtering Applied to Musical Tempo Tracking , 2004, EURASIP J. Adv. Signal Process..

[18]  Andrew McLeod,et al.  Improved Metrical Alignment of Midi Performance Based on a Repetition-aware Online-adapted Grammar , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Simon Dixon,et al.  Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation , 2018, ISMIR.

[20]  Christopher Ariza,et al.  Feature Extraction and Machine Learning on Symbolic Music using the music21 Toolkit , 2011, ISMIR.

[21]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[22]  Andrew M. Dai,et al.  Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.

[23]  Matthew E. P. Davies,et al.  Selective Sampling for Beat Tracking Evaluation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Masataka Goto,et al.  Automatic Singing Transcription Based on Encoder-decoder Recurrent Neural Networks with a Weakly-supervised Attention Mechanism , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Jeremy R. Cooperstock,et al.  A High-Fidelity Orchestra Simulator for Individual Musicians’ Practice , 2012, Computer Music Journal.

[26]  Juhan Nam,et al.  VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano Performance , 2019, ISMIR.

[27]  Paris Smaragdis,et al.  Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[28]  Mark Steedman,et al.  Evaluating Automatic Polyphonic Music Transcription , 2018, ISMIR.

[29]  Eita Nakamura,et al.  Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Geoffroy Peeters,et al.  Swing Ratio Estimation , 2015 .

[31]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[32]  Florian Krebs,et al.  Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[33]  Peter Grosche,et al.  What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas , 2010, ISMIR.

[34]  Douglas Eck,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[35]  Gerhard Widmer,et al.  A Study of Annotation and Alignment Accuracy for Performance Comparison in Complex Orchestral Music , 2019, ISMIR.

[36]  Colin Raffel,et al.  Onsets and Frames: Dual-Objective Piano Transcription , 2017, ISMIR.

[37]  Meinard Müller,et al.  Analyzing Measure Annotations for Western Classical Music Recordings , 2016, ISMIR.

[38]  Juan Pablo Bello,et al.  Adversarial Learning for Improved Onsets and Frames Music Transcription , 2019, ISMIR.

[39]  Gerhard Widmer,et al.  Computational Models of Expressive Music Performance: A Comprehensive and Critical Review , 2018, Front. Digit. Humanit..