Improving Note Segmentation in Automatic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method

Many methods for automatic piano music transcription involve a multi-pitch estimation method that estimates an activity score for each pitch. A second processing step, called note segmentation, has to be performed for each pitch in order to identify the time intervals when the notes are played. In this study, a pitch-wise two-state on/off first-order Hidden Markov Model (HMM) is developed for note segmentation. A complete parametrization of the HMM sigmoid function is proposed, based on its original regression formulation, including a parameter α of slope smoothing and β of thresholding contrast. A comparative evaluation of different note segmentation strategies was performed, differentiated according to whether they use a fixed threshold, called “Hard Thresholding” (HT), or a HMM-based thresholding method, called “Soft Thresholding” (ST). This evaluation was done following MIREX standards and using the MAPS dataset. Also, different transcription and recording scenarios were tested using three units of the Audio Degradation toolbox. Results show that note segmentation through a HMM soft thresholding with a data-based optimization of the {α, β} parameter couple significantly enhances transcription performance.

[1]  Hirokazu Kameoka,et al.  Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms , 2010, LVA/ICA.

[2]  O. Lartillot,et al.  A MATLAB TOOLBOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO , 2007 .

[3]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[4]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[5]  Roland Badeau,et al.  ON AUDIO , SPEECH , AND LANGUAGE PROCESSING 1 Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription , 2013 .

[6]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[7]  Vipul Arora,et al.  Instrument identification using PLCA over stretched manifolds , 2014, 2014 Twentieth National Conference on Communications (NCC).

[8]  Andreas Rauber,et al.  On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-Western and ethnic music collections , 2010, Signal Process..

[9]  Simon Dixon,et al.  Improving piano note tracking by HMM smoothing , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[10]  Paris Smaragdis,et al.  Relative pitch estimation of multiple instruments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Olivier Adam,et al.  Understanding the intentional acoustic behavior of humpback whales: a production-based approach. , 2013, The Journal of the Acoustical Society of America.

[12]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[13]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model. , 2013, The Journal of the Acoustical Society of America.

[14]  Marc Leman,et al.  The Problems and Opportunities of Content-based Analysis and Description of Ethnic Music , 2007 .

[15]  Anssi Klapuri,et al.  Automatic Music Transcription as We Know it Today , 2004 .

[16]  Tillman Weyde,et al.  Shift-Invariant Model for Polyphonic Music Transcription , 2013 .

[17]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[18]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Simon Dixon,et al.  A Shift-Invariant Latent Variable Model for Automatic Music Transcription , 2012, Computer Music Journal.

[20]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Tillman Weyde,et al.  An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription , 2015, ISMIR.

[23]  Daniel P. W. Ellis,et al.  Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[24]  Sebastian Ewert,et al.  The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[25]  Marc Leman,et al.  Access to ethnic music: Advances and perspectives in content-based music information retrieval , 2010, Signal Process..

[26]  James A. Moorer,et al.  On the Transcription of Musical Sound by Computer , 2016 .