A supervised classification approach for note tracking in polyphonic piano transcription

Abstract In the field of Automatic Music Transcription, note tracking systems constitute a key process in the overall success of the task as they compute the expected note-level abstraction out of a frame-based pitch activation representation. Despite its relevance, note tracking is most commonly performed using a set of hand-crafted rules adjusted in a manual fashion for the data at issue. In this regard, the present work introduces an approach based on machine learning, and more precisely supervised classification, that aims at automatically inferring such policies for the case of piano music. The idea is to segment each pitch band of a frame-based pitch activation into single instances which are subsequently classified as active or non-active note events. Results using a comprehensive set of supervised classification strategies on the MAPS piano data-set report its competitiveness against other commonly considered strategies for note tracking as well as an improvement of more than in terms of F-measure when compared to the baseline considered for both frame-level and note-level evaluations.

[1]  Roland Badeau,et al.  Automatic transcription of piano music based on HMM tracking of jointly-estimated pitches , 2008, 2008 16th European Signal Processing Conference.

[2]  Anssi Klapuri,et al.  Recognition of Note Onsets in Digital Music Using Semitone Bands , 2005, CIARP.

[3]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[6]  Zhiyao Duan,et al.  Note-level Music Transcription by Maximum Likelihood Sampling , 2014, ISMIR.

[7]  Mark B. Sandler,et al.  Automatic Piano Transcription Using Frequency and Time-Domain Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Andreas Rauber,et al.  A Cartesian Ensemble of Feature Subspace Classifiers for Music Categorization , 2010, ISMIR.

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Björn Schuller,et al.  Automatic Transcription of Recorded Music , 2012 .

[12]  José Miguel Díaz-Báñez,et al.  Corpus COFLA , 2016, ACM Journal on Computing and Cultural Heritage.

[13]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[14]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[15]  Simon Dixon,et al.  Polyphonic music transcription using note onset and offset detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Tillman Weyde,et al.  An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription , 2015, ISMIR.

[17]  Daniel P. W. Ellis,et al.  Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[18]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  M. Davies,et al.  Complex domain onset detection for musical signals , 2003 .

[20]  José Manuel Iñesta Quereda,et al.  Efficient methods for joint estimation of multiple fundamental frequencies in music signals , 2012, EURASIP Journal on Advances in Signal Processing.

[21]  F. J. Cañadas Quesada,et al.  A Multiple-F0 Estimation Approach Based on Gaussian Spectral Modelling for Polyphonic Music Transcription , 2010 .

[22]  G. Widmer,et al.  MAXIMUM FILTER VIBRATO SUPPRESSION FOR ONSET DETECTION , 2013 .

[23]  Mert Bay,et al.  Evaluation of Multiple-F0 Estimation and Tracking Systems , 2009, ISMIR.

[24]  Isabelle Guyon,et al.  What Size Test Set Gives Good Error Rate Estimates? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Anssi Klapuri,et al.  Automatic Music Transcription: Breaking the Glass Ceiling , 2012, ISMIR.

[26]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[27]  Emmanouil Benetos,et al.  Classification-based Note Tracking for Automatic Music Transcription , 2016 .

[28]  Matija Marolt,et al.  On detecting repeated notes in piano music , 2002, ISMIR.

[29]  Shigeki Sagayama,et al.  Note detection with dynamic bayesian networks as a postanalysis step for NMF-based multiple pitch estimation techniques , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[30]  José Manuel Iñesta Quereda,et al.  Interactive multimodal music transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Simon Dixon,et al.  Improving piano note tracking by HMM smoothing , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[33]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[34]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[35]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model. , 2013, The Journal of the Acoustical Society of America.

[36]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[37]  David G. Stork,et al.  Pattern Classification , 1973 .

[38]  Dan Klein,et al.  Unsupervised Transcription of Piano Music , 2014, NIPS.

[39]  Björn W. Schuller,et al.  A discriminative approach to polyphonic piano note transcription using supervised non-negative matrix factorization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Gerhard Widmer,et al.  Local Group Delay Based Vibrato and Tremolo Suppression for Onset Detection , 2013, ISMIR.

[41]  José Manuel Iñesta Quereda,et al.  Assessing the Relevance of Onset Information for Note Tracking in Piano Music Transcription , 2017, Semantic Audio.

[42]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[44]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[45]  Emmanouil Benetos,et al.  Automatic Transcription of Polyphonic Music Exploiting Temporal Evolution , 2012 .

[46]  Florian Krebs,et al.  Evaluating the Online Capabilities of Onset Detection Methods , 2012, ISMIR.