Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization

Most work on automatic transcription produces “piano roll” data with no musical interpretation of the rhythm or pitches. We present a polyphonic transcription method that converts a music audio signal into a human-readable musical score, by integrating multi-pitch detection and rhythm quantization methods. This integration is made difficult by the fact that the multi-pitch detection produces erroneous notes such as extra notes and introduces timing errors that are added to temporal deviations due to musical expression. Thus, we propose a rhythm quantization method that can remove extra notes by extending the metrical hidden Markov model and optimize the model parameters. We also improve the note-tracking process of multi-pitch detection by refining the treatment of repeated notes and adjustment of onset times. Finally, we propose evaluation measures for transcribed scores. Systematic evaluations on commonly used classical piano data show that these treatments improve the performance of transcription, which can be used as benchmarks for further studies.

[1]  David Temperley,et al.  A Unified Probabilistic Model for Polyphonic Music Analysis , 2009 .

[2]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[3]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Tillman Weyde,et al.  An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription , 2015, ISMIR.

[5]  H. C. Longuet-Higgins,et al.  Mental Processes: Studies in Cognitive Science , 1987 .

[6]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[7]  Eita Nakamura,et al.  Note Value Recognition for Piano Transcription Using Markov Random Fields , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[9]  Daniel Dominic Sleator,et al.  Modeling Meter and Harmony: A Preference-Rule Approach , 1999, Computer Music Journal.

[10]  Christopher Raphael,et al.  A hybrid graphical model for rhythmic parsing , 2002, Artif. Intell..

[11]  Eita Nakamura,et al.  Performance Error Detection and Post-Processing for Fast and Accurate Symbolic Music Alignment , 2017, ISMIR.

[12]  Christopher Raphael A Graphical Model for Recognizing Sung Melodies , 2005, ISMIR.

[13]  Zhiyao Duan,et al.  Transcribing Human Piano Performances into Music Notation , 2016, ISMIR.

[14]  Zhiyao Duan,et al.  A Metric for Music Notation Transcription Accuracy , 2017, ISMIR.

[15]  Peter Desain,et al.  Quantization of musical time: a connectionist approach , 1989 .

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Masataka Goto,et al.  A Learning-Based Quantization: Unsupervised Estimation of the Model Parameters , 2003, ICMC.

[18]  Tomoshi Otsuki,et al.  Hidden Markov model for automatic transcription of MIDI signals , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[19]  Gerhard Widmer,et al.  On the Potential of Simple Framewise Approaches to Piano Transcription , 2016, ISMIR.

[20]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Avi Pfeffer,et al.  Signal-to-Score Music Transcription using Graphical Models , 2005, IJCAI.

[22]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Mark D. Plumbley,et al.  Polyphonic piano transcription using non-negative Matrix Factorisation with group sparsity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Mert Bay,et al.  Evaluation of Multiple-F0 Estimation and Tracking Systems , 2009, ISMIR.

[25]  Eita Nakamura,et al.  Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Peter Desain,et al.  Rhythm Quantization for Transcription , 2000, Computer Music Journal.

[27]  Eita Nakamura,et al.  Merged-Output HMM for Piano Fingering of Both Hands , 2014, ISMIR.