Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription

Automatic music transcription (AMT) can be performed by deriving a pitch-time representation through decomposition of a spectrogram with a dictionary of pitch-labelled atoms. Typically, non-negative matrix factorisation (NMF) methods are used to decompose magnitude spectrograms. One atom is often used to represent each note. However, the spectrum of a note may change over time. Previous research considered this variability using different atoms to model specific parts of a note, or large dictionaries comprised of datapoints from the spectrograms of full notes. In this paper, the use of subspace modelling of note spectra is explored, with group sparsity employed as a means of coupling activations of related atoms into a pitched subspace. Stepwise and gradient-based methods for non-negative group sparse decompositions are proposed. Finally, a group sparse NMF approach is used to tune a generic harmonic subspace dictionary, leading to improved NMF-based AMT results.

[1]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yonina C. Eldar,et al.  Dictionary Optimization for Block-Sparse Representations , 2010, IEEE Transactions on Signal Processing.

[3]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[4]  Mark D. Plumbley,et al.  Using Oracle Analysis for Decomposition-Based Automatic Music Transcription , 2012, CMMR.

[5]  Robert D. Nowak,et al.  Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[6]  Trac D. Tran,et al.  Stepwise Optimal Subspace Pursuit for Improving Sparse Recovery , 2011, IEEE Signal Processing Letters.

[7]  Mark D. Plumbley,et al.  Polyphonic piano transcription using non-negative Matrix Factorisation with group sparsity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Anssi Klapuri,et al.  Separation of harmonic sounds using linear models for the overtone series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Donghui Chen,et al.  Nonnegativity constraints in numerical analysis , 2009, The Birth of Numerical Analysis.

[11]  K. J. Ray Liu,et al.  Factorization of Overlapping Harmonic Sounds Using Approximate Matching Pursuit , 2011, ISMIR.

[12]  Yonina C. Eldar,et al.  Block-Sparse Signals: Uncertainty Relations and Efficient Recovery , 2009, IEEE Transactions on Signal Processing.

[13]  Rahim Saeidi,et al.  Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition , 2012, INTERSPEECH.

[14]  Roland Badeau,et al.  Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Mark D. Plumbley,et al.  Structured sparsity using backwards elimination for Automatic Music Transcription , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[16]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[17]  Vincent Y. F. Tan,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Gautham J. Mysore,et al.  Universal speech models for speaker independent single channel source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[20]  Markus Schedl,et al.  Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[22]  Paris Smaragdis Polyphonic pitch tracking by example , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[23]  Anssi Klapuri,et al.  Automatic Music Transcription: Breaking the Glass Ceiling , 2012, ISMIR.

[24]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[25]  Anamitra Makur,et al.  Backtracking-Based Matching Pursuit Method for Sparse Signal Reconstruction , 2011, IEEE Signal Processing Letters.

[26]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[27]  Matthias Hein,et al.  Sparse recovery by thresholded non-negative least squares , 2011, NIPS.

[28]  Björn W. Schuller,et al.  A discriminative approach to polyphonic piano note transcription using supervised non-negative matrix factorization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Zhaoshui He,et al.  Extended SMART Algorithms for Non-negative Matrix Factorization , 2006, ICAISC.

[30]  Emmanuel Vincent,et al.  Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Emmanuel Vincent,et al.  Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Zihan Zhou,et al.  Separation of a subspace-sparse signal: Algorithms and conditions , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Bob L. Sturm,et al.  Cyclic matching pursuits with multiscale time-frequency dictionaries , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[34]  Michael Elad,et al.  On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations , 2008, IEEE Transactions on Information Theory.

[35]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[37]  Simon Dixon,et al.  A Shift-Invariant Latent Variable Model for Automatic Music Transcription , 2012, Computer Music Journal.

[38]  Francis Bach,et al.  Itakura-Saito nonnegative matrix factorization with group sparsity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Hirokazu Kameoka,et al.  Infinite-state spectrum model for music signal analysis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Tillman Weyde,et al.  Automatic transcription of pitched and unpitched sounds from polyphonic music , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Shigeki Sagayama,et al.  Extending Nonnegative Matrix Factorization—A discussion in the context of multiple frequency estimation of musical signals , 2009, 2009 17th European Signal Processing Conference.

[42]  J. Leeuw,et al.  DRAWING DATA GRAPHS BY PUSHING AND PULLING , 2017 .

[43]  Jean-Luc Starck,et al.  Robust Non-Negative Matrix Factorization for Multispectral Data with Sparse Prior , 2012 .

[44]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[45]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[46]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[47]  Hirokazu Kameoka,et al.  Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[48]  Ali Taylan Cemgil,et al.  Learning the beta-Divergence in Tweedie Compound Poisson Matrix Factorization Models , 2013, ICML.

[49]  B. Moghaddam,et al.  Sparse regression as a sparse eigenvalue problem , 2008, 2008 Information Theory and Applications Workshop.

[50]  Mark D. Plumbley,et al.  Polyphonic transcription by non-negative sparse coding of power spectra , 2004, ISMIR.

[51]  Michael Elad,et al.  K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[52]  Bhiksha Raj,et al.  Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[53]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[54]  Emmanuel Vincent,et al.  Fast bayesian nmf algorithms enforcing harmonicity and temporal continuity in polyphonic music transcription , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[55]  Mark D. Plumbley,et al.  Structured sparsity for automatic music transcription , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56]  Ken O'Hanlon,et al.  Automatic Music Transcription using Structure and Sparsity , 2014 .

[57]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[58]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[59]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[60]  Nicolás Ruiz-Reyes,et al.  Music Scene-Adaptive Harmonic Dictionary for Unsupervised Note-Event Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[61]  D. Donoho,et al.  Sparse nonnegative solution of underdetermined linear equations by linear programming. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[63]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .