Bilevel Sparse Models for Polyphonic Music Transcription

In this work, we propose a trainable sparse model for automatic polyphonic music transcription, which incorporates several successful approaches into a unified optimization framework. Our model combines unsupervised synthesis models similar to latent component analysis and nonnegative factorization with metric learning techniques that allow supervised discriminative learning. We develop efficient stochastic gradient training schemes allowing unsupervised, semi-, and fully supervised training of the model as well its adaptation to test data. We show efficient fixed complexity and latency approximation that can replace iterative minimization algorithms in time-critical applications. Experimental evaluation on synthetic and real data shows promising initial results.

[1]  Markus Schedl,et al.  Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[3]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Simon Dixon,et al.  Polyphonic music transcription using note onset and offset detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Juhan Nam,et al.  A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[6]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[7]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[8]  Guillermo Sapiro,et al.  Real-time Online Singing Voice Separation from Monaural Recordings Using Robust Low-rank Modeling , 2012, ISMIR.

[9]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[10]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[13]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[14]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[15]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[16]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[19]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a convolutive probabilistic model , 2011 .

[20]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[21]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Anssi Klapuri,et al.  Automatic Music Transcription: Breaking the Glass Ceiling , 2012, ISMIR.