Multi-Template Shift-Variant Non-Negative Matrix Deconvolution for Semi-Automatic Music Transcription

For the task of semi-automatic music transcription, we extended our framework for shift-variant non-negative matrix deconvolution (svNMD) to work with multiple templates per instrument and pitch. A k-means clustering based learning algorithm is proposed that infers the templates from the data based on the provided user information. We experimentally explored the maximum achievable transcription accuracy of the algorithm and evaluated the prospective performance in a realistic setting. The results showed a clear superiority of the Itakura-Saito divergence over the Kullback-Leibler divergence and a consistent improvement of the maximum achievable accuracy when each pitch is represented by more than one spectral template.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[3]  Anssi Klapuri,et al.  Shift-variant non-negative matrix deconvolution for music transcription , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[5]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Gautham J. Mysore,et al.  Source Separation By Score Synthesis , 2010, ICMC.

[7]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[8]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[9]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[10]  Simon Dixon,et al.  Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription , 2011, IEEE Journal of Selected Topics in Signal Processing.