论文信息 - Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription

Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription

Automatic music transcription (AMT) can be performed by deriving a pitch-time representation through decomposition of a spectrogram with a dictionary of pitch-labelled atoms. Typically, non-negative matrix factorisation (NMF) methods are used to decompose magnitude spectrograms. One atom is often used to represent each note. However, the spectrum of a note may change over time. Previous research considered this variability using different atoms to model specific parts of a note, or large dictionaries comprised of datapoints from the spectrograms of full notes. In this paper, the use of subspace modelling of note spectra is explored, with group sparsity employed as a means of coupling activations of related atoms into a pitched subspace. Stepwise and gradient-based methods for non-negative group sparse decompositions are proposed. Finally, a group sparse NMF approach is used to tune a generic harmonic subspace dictionary, leading to improved NMF-based AMT results.

[1] Emmanuel Vincent,et al. Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Yonina C. Eldar,et al. Dictionary Optimization for Block-Sparse Representations , 2010, IEEE Transactions on Signal Processing.

[3] Patrik O. Hoyer,et al. Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[4] Mark D. Plumbley,et al. Using Oracle Analysis for Decomposition-Based Automatic Music Transcription , 2012, CMMR.

[5] Robert D. Nowak,et al. Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[6] Trac D. Tran,et al. Stepwise Optimal Subspace Pursuit for Improving Sparse Recovery , 2011, IEEE Signal Processing Letters.

[7] Mark D. Plumbley,et al. Polyphonic piano transcription using non-negative Matrix Factorisation with group sparsity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Hirokazu Kameoka,et al. Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Anssi Klapuri,et al. Separation of harmonic sounds using linear models for the overtone series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Donghui Chen,et al. Nonnegativity constraints in numerical analysis , 2009, The Birth of Numerical Analysis.

[11] K. J. Ray Liu,et al. Factorization of Overlapping Harmonic Sounds Using Approximate Matching Pursuit , 2011, ISMIR.

[12] Yonina C. Eldar,et al. Block-Sparse Signals: Uncertainty Relations and Efficient Recovery , 2009, IEEE Transactions on Signal Processing.

[13] Rahim Saeidi,et al. Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition , 2012, INTERSPEECH.

[14] Roland Badeau,et al. Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15] Mark D. Plumbley,et al. Structured sparsity using backwards elimination for Automatic Music Transcription , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[16] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[17] Vincent Y. F. Tan,et al. Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Gautham J. Mysore,et al. Universal speech models for speaker independent single channel source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19] Charles L. Lawson,et al. Solving least squares problems , 1976, Classics in applied mathematics.

[20] Markus Schedl,et al. Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Mark D. Plumbley,et al. Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[22] Paris Smaragdis. Polyphonic pitch tracking by example , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[23] Anssi Klapuri,et al. Automatic Music Transcription: Breaking the Glass Ceiling , 2012, ISMIR.

[24] Guillaume Lemaitre,et al. Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[25] Anamitra Makur,et al. Backtracking-Based Matching Pursuit Method for Sparse Signal Reconstruction , 2011, IEEE Signal Processing Letters.

[26] P. Smaragdis,et al. Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[27] Matthias Hein,et al. Sparse recovery by thresholded non-negative least squares , 2011, NIPS.

[28] Björn W. Schuller,et al. A discriminative approach to polyphonic piano note transcription using supervised non-negative matrix factorization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29] Zhaoshui He,et al. Extended SMART Algorithms for Non-negative Matrix Factorization , 2006, ICAISC.

[30] Emmanuel Vincent,et al. Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[31] Emmanuel Vincent,et al. Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32] Zihan Zhou,et al. Separation of a subspace-sparse signal: Algorithms and conditions , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33] Bob L. Sturm,et al. Cyclic matching pursuits with multiscale time-frequency dictionaries , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[34] Michael Elad,et al. On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations , 2008, IEEE Transactions on Information Theory.

[35] Roland Badeau,et al. Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[36] Jérôme Idier,et al. Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[37] Simon Dixon,et al. A Shift-Invariant Latent Variable Model for Automatic Music Transcription , 2012, Computer Music Journal.

[38] Francis Bach,et al. Itakura-Saito nonnegative matrix factorization with group sparsity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39] Hirokazu Kameoka,et al. Infinite-state spectrum model for music signal analysis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40] Tillman Weyde,et al. Automatic transcription of pitched and unpitched sounds from polyphonic music , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41] Shigeki Sagayama,et al. Extending Nonnegative Matrix Factorization—A discussion in the context of multiple frequency estimation of musical signals , 2009, 2009 17th European Signal Processing Conference.

[42] J. Leeuw,et al. DRAWING DATA GRAPHS BY PUSHING AND PULLING , 2017 .

[43] Jean-Luc Starck,et al. Robust Non-Negative Matrix Factorization for Multispectral Data with Sparse Prior , 2012 .

[44] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[45] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[46] C. Lawson,et al. Solving least squares problems , 1976, Classics in applied mathematics.

[47] Hirokazu Kameoka,et al. Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[48] Ali Taylan Cemgil,et al. Learning the beta-Divergence in Tweedie Compound Poisson Matrix Factorization Models , 2013, ICML.

[49] B. Moghaddam,et al. Sparse regression as a sparse eigenvalue problem , 2008, 2008 Information Theory and Applications Workshop.

[50] Mark D. Plumbley,et al. Polyphonic transcription by non-negative sparse coding of power spectra , 2004, ISMIR.

[51] Michael Elad,et al. K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[52] Bhiksha Raj,et al. Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[53] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[54] Emmanuel Vincent,et al. Fast bayesian nmf algorithms enforcing harmonicity and temporal continuity in polyphonic music transcription , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[55] Mark D. Plumbley,et al. Structured sparsity for automatic music transcription , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56] Ken O'Hanlon,et al. Automatic Music Transcription using Structure and Sparsity , 2014 .

[57] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[58] Joel A. Tropp,et al. Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[59] Jérôme Idier,et al. Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[60] Nicolás Ruiz-Reyes,et al. Music Scene-Adaptive Harmonic Dictionary for Unsupervised Note-Event Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[61] D. Donoho,et al. Sparse nonnegative solution of underdetermined linear equations by linear programming. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[62] Emmanuel Vincent,et al. Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[63] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .