A Shift-Invariant Latent Variable Model for Automatic Music Transcription

In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics.

[1]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  F. J. Cañadas Quesada,et al.  A Multiple-F0 Estimation Approach Based on Gaussian Spectral Modelling for Polyphonic Music Transcription , 2010 .

[4]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[5]  Guy J. Brown,et al.  Multiple F0 Estimation , 2006 .

[6]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a convolutive probabilistic model , 2011 .

[7]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[8]  Bhiksha Raj,et al.  Adobe Systems , 1998 .

[9]  Daniel P. W. Ellis,et al.  Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[10]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  J. Stephen Downie,et al.  The Music Information Retrieval Evaluation eXchange (MIREX) , 2006 .

[12]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[13]  Roland Badeau,et al.  Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[15]  Simon Dixon,et al.  On the Computer Recognition of Solo Piano Music , 2000 .

[16]  Paris Smaragdis Relative-pitch tracking of multiple arbitrary sounds. , 2009, The Journal of the Acoustical Society of America.

[17]  S. Dixon,et al.  MULTIPLE-F 0 ESTIMATION AND NOTE TRACKING USING A CONVOLUTIVE PROBABILISTIC MODEL , 2011 .

[18]  Daniel P. W. Ellis,et al.  A Probabilistic Subspace Model for Multi-instrument Polyphonic Transcription , 2010, ISMIR.

[19]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[20]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[21]  Julius O. Smith,et al.  A non-negative framework for joint modeling of spectral structure and temporal dynamics in sound mixtures , 2010 .

[22]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[23]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[24]  Simon Dixon,et al.  Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription , 2011, IEEE Journal of Selected Topics in Signal Processing.

[25]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[26]  Anssi Klapuri,et al.  Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music , 2008, Computer Music Journal.

[27]  Mert Bay,et al.  Evaluation of Multiple-F0 Estimation and Tracking Systems , 2009, ISMIR.

[28]  Paris Smaragdis,et al.  Relative pitch estimation of multiple instruments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .