High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times

Automatic music transcription (AMT) is the task of transcribing audio recordings into symbolic representations such as Musical Instrument Digital Interface (MIDI). Recently, neural networks based methods have been applied to AMT, and have achieved state-of-the-art result. However, most of previous AMT systems predict the presence or absence of notes in the frames of audio recordings. The transcription resolution of those systems are limited to the hop size time between adjacent frames. In addition, previous AMT systems are sensitive to the misaligned onsets and offsets labels of audio recordings. For high-resolution evaluation, previous works have not investigated AMT systems evaluated with different onsets and offsets tolerances. For piano transcription, there is a lack of research on building AMT systems with both note and pedal transcription. In this article, we propose a high-resolution AMT system trained by regressing precise times of onsets and offsets. In inference, we propose an algorithm to analytically calculate the precise onsets and offsets times of note and pedal events. We build both note and pedal transcription systems with our high-resolution AMT system. We show that our AMT system is robust to misaligned onsets and offsets labels compared to previous systems. Our proposed system achieves an onset F1 of 96.72% on the MAESTRO dataset, outperforming the onsets and frames system from Google of 94.80%. Our system achieves a pedal onset F1 score of 91.86%, and is the first benchmark result on the MAESTRO dataset. We release the source code of our work at this https URL

[1]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[2]  Markus Schedl,et al.  Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[5]  Mark D. Plumbley,et al.  PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Thomas Grill,et al.  Boundary Detection in Music Structure Analysis using Convolutional Neural Networks , 2014, ISMIR.

[7]  Dan Klein,et al.  Unsupervised Transcription of Piano Music , 2014, NIPS.

[8]  Daniel P. W. Ellis,et al.  MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[9]  Mark B. Sandler,et al.  Automatic Piano Transcription Using Frequency and Time-Domain Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[12]  Gerhard Widmer,et al.  Deep Polyphonic ADSR Piano Note Transcription , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  György Fazekas,et al.  Piano Sustain-pedal Detection Using Convolutional Neural Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Simon Dixon,et al.  An Attack/Decay Model for Piano Transcription , 2016, ISMIR.

[15]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[16]  Douglas Eck,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[17]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  Colin Raffel,et al.  Onsets and Frames: Dual-Objective Piano Transcription , 2017, ISMIR.

[20]  Julien Schroeter,et al.  SoftLoc: Robust Temporal Localization under Label Misalignment , 2019 .

[21]  W Goebl,et al.  Melody lead in piano performance: expressive device or artifact? , 2001, The Journal of the Acoustical Society of America.

[22]  Juan Pablo Bello,et al.  Adversarial Learning for Improved Onsets and Frames Music Transcription , 2019, ISMIR.

[23]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[25]  Anders Elowsson Modeling Music : Studies of Music Transcription, Music Perception and Music Production , 2018 .

[26]  Eric D. Scheirer,et al.  Using musical knowledge to extract expressive performance information from audio recordings , 1998 .

[27]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[28]  Roger B. Dannenberg The Interpretation of MIDI Velocity , 2006, ICMC.

[29]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Gerhard Widmer,et al.  Multitask Learning for Polyphonic Piano Transcription, a Case Study , 2019, 2019 International Workshop on Multilayer Music Representation and Processing (MMRP).

[31]  György Fazekas,et al.  Detection of Piano Pedaling Techniques on the Sustain Pedal , 2017 .

[32]  Gerhard Widmer,et al.  On the Potential of Simple Framewise Approaches to Piano Transcription , 2016, ISMIR.

[33]  Simon Dixon,et al.  Automatic Music Transcription: An Overview , 2019, IEEE Signal Processing Magazine.

[34]  Simon Dixon,et al.  On the Computer Recognition of Solo Piano Music , 2000 .

[35]  Christopher Raphael,et al.  Automatic Transcription of Piano Music , 2002, ISMIR.

[36]  Mark D. Plumbley,et al.  Polyphonic piano transcription using non-negative Matrix Factorisation with group sparsity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Gerhard Widmer,et al.  A Multi-pass Algorithm for Accurate Audio-to-Score Alignment , 2010, ISMIR.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Anders Elowsson,et al.  Polyphonic Pitch Tracking with Deep Layered Learning , 2018, The Journal of the Acoustical Society of America.

[40]  Vassilis Katsouros,et al.  Convolutional Neural Networks for Real-Time Beat Tracking: A Dancing Robot Application , 2017, ISMIR.

[41]  Juhan Nam,et al.  A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[42]  Bochen Li,et al.  An Approach to Score Following for Piano Performances With the Sustained Effect , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Matija Marolt Transcription of polyphonic piano music with neural networks , 2000, 2000 10th Mediterranean Electrotechnical Conference. Information Technology and Electrotechnology for the Mediterranean Countries. Proceedings. MeleCon 2000 (Cat. No.00CH37099).

[44]  Justin Salamon,et al.  Deep Salience Representations for F0 Estimation in Polyphonic Music , 2017, ISMIR.

[45]  Brendt Wohlberg,et al.  Piano music transcription with fast convolutional sparse coding , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).