Instrument Learning and Sparse NMD for Automatic Polyphonic Music Transcription

In this paper, an automatic music transcription (AMT) algorithm based on a supervised non-negative matrix decomposition (NMD) is discussed. In particular, a novel approach for enhancing the sparsity of the solution is proposed. It consists of a two-step processing in which the NMD is solved joining a <inline-formula> <tex-math notation="LaTeX">$\ell _2$</tex-math></inline-formula> regularization and a threshold filtering. In the first step, the NMD is performed with the <inline-formula><tex-math notation="LaTeX">$\ell _2$</tex-math></inline-formula> regularization in order to get an overall selection of the notes most likely appearing in the monotimbral musical excerpt. In the second step, a threshold filtering followed by another <inline-formula><tex-math notation="LaTeX"> $\ell _2$</tex-math></inline-formula> regularized NMD are repeatedly performed in order to progressively reduce the dictionary matrix and to refine the notes transcription. Furthermore, a user-oriented instrument learning procedure has been conceived and proposed. The proposed AMT system has been tested upon the dataset collected by the LabROSA laboratories considering the transcription of three different pianos. Moreover, it has been validated through a comparison with a regularized NMD and with three open source AMT software. The results prove the effectiveness of the proposed two-step processing in enhancing the sparsity of the solution and in improving the transcription accuracy. Moreover, the proposed system shows promising performance in both multi-F0 and note tracking tasks, obtaining better transcription accuracy than the competing algorithms in most tests.

[1]  Marc Leman,et al.  Factors affecting music retrieval in query-by-melody , 2006, IEEE Transactions on Multimedia.

[2]  Tillman Weyde,et al.  An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription , 2015, ISMIR.

[3]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[4]  Daniel P. W. Ellis,et al.  Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[5]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[6]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model. , 2013, The Journal of the Acoustical Society of America.

[7]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[8]  Andreas Jakobsson,et al.  Multi-pitch estimation exploiting block sparsity , 2015, Signal Process..

[9]  A.P. Klapuri,et al.  A perceptually motivated multiple-F0 estimation method , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[10]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[11]  Israel Cohen,et al.  Multiple fundamental frequency estimation based on sparse representations in a structured dictionary , 2013, Digit. Signal Process..

[12]  Yi Yu,et al.  Scalable Content-Based Music Retrieval Using Chord Progression Histogram and Tree-Structure LSH , 2013, IEEE Transactions on Multimedia.

[13]  Simon Dixon,et al.  On the Computer Recognition of Solo Piano Music , 2000 .

[14]  Yi-Hsuan Yang,et al.  Multipitch Estimation of Piano Music by Exemplar-Based Sparse Representation , 2012, IEEE Transactions on Multimedia.

[15]  Steven F. Boll,et al.  Constant-Q signal analysis and synthesis , 1978, ICASSP.

[16]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[17]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[18]  M. Davy,et al.  Bayesian analysis of polyphonic western tonal music. , 2006, The Journal of the Acoustical Society of America.

[19]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[20]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  James A. Moorer,et al.  On the Transcription of Musical Sound by Computer , 2016 .

[22]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[23]  Mark D. Plumbley,et al.  Polyphonic piano transcription using non-negative Matrix Factorisation with group sparsity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Nicholas I. M. Gould,et al.  Lancelot: A FORTRAN Package for Large-Scale Nonlinear Optimization (Release A) , 1992 .

[25]  Mark D. Plumbley,et al.  Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Simon Dixon,et al.  A Shift-Invariant Latent Variable Model for Automatic Music Transcription , 2012, Computer Music Journal.

[27]  Karthikeyan Umapathy,et al.  Multigroup classification of audio signals using time-frequency parameters , 2005, IEEE Transactions on Multimedia.

[28]  A. Rizzi,et al.  A Non-Monotone Optimization Algorithm for IIR Filter Design , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[29]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Hsin-Min Wang,et al.  A Query-by-Singing System for Retrieving Karaoke Music , 2008, IEEE Transactions on Multimedia.

[31]  Martin Piszczalski,et al.  Automatic Music Transcription , 2016 .

[32]  Juan Pablo Bello,et al.  Time-domain polyphonic transcription using self-generating databases , 2002 .

[33]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[34]  Judith C. Brown Musical fundamental frequency tracking using a pattern recognition method , 1992 .

[35]  Masataka Goto,et al.  A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Judith C. Brown,et al.  Musical frequency tracking using the methods of conventional and , 1991 .

[37]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[38]  Frank Kurth,et al.  A unified approach to content-based and fault-tolerant music recognition , 2004, IEEE Transactions on Multimedia.

[39]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[40]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.