Investigation on the use of Hidden-Markov Models in automatic transcription of music

Hidden Markov Models (HMMs) are a ubiquitous tool to model time series data, and have been widely used in two main tasks of Automatic Music Transcription (AMT): note segmentation, i.e. identifying the played notes after a multi-pitch estimation, and sequential post-processing, i.e. correcting note segmentation using training data. In this paper, we employ the multi-pitch estimation method called Probabilistic Latent Component Analysis (PLCA), and develop AMT systems by integrating different HMM-based modules in this framework. For note segmentation, we use two different twostate on/o? HMMs, including a higher-order one for duration modeling. For sequential post-processing, we focused on a musicological modeling of polyphonic harmonic transitions, using a first- and second-order HMMs whose states are defined through candidate note mixtures. These different PLCA plus HMM systems have been evaluated comparatively on two different instrument repertoires, namely the piano (using the MAPS database) and the marovany zither. Our results show that the use of HMMs could bring noticeable improvements to transcription results, depending on the instrument repertoire.

[1]  Anssi Klapuri,et al.  Automatic Music Transcription as We Know it Today , 2004 .

[2]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[3]  Paris Smaragdis,et al.  Relative pitch estimation of multiple instruments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Michael K. Ng,et al.  Higher-order multivariate Markov chains and their applications , 2008 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Christopher Raphael,et al.  Harmonic analysis with probabilistic graphical models , 2003, ISMIR.

[7]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[9]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[10]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[11]  Ye Wang,et al.  Key determination of acoustic musical signals , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[12]  Anssi Klapuri,et al.  Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music , 2008, Computer Music Journal.

[13]  Daniel P. W. Ellis,et al.  Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[14]  Abdelaziz Kriouile,et al.  Automatic word recognition based on second-order hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[15]  Yang He Extended Viterbi algorithm for second order hidden Markov process , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[16]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[17]  Malcolm Slaney,et al.  Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[19]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[20]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[21]  J. Stephen Downie,et al.  The Music Information Retrieval Evaluation eXchange (MIREX) , 2006 .

[22]  Tillman Weyde,et al.  An efficient shift-invariant model for polyphonic music transcription , 2013 .

[23]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.