论文信息 - NMF With Time–Frequency Activations to Model Nonstationary Audio Events

NMF With Time–Frequency Activations to Model Nonstationary Audio Events

Real-world sounds often exhibit time-varying spectral shapes, as observed in the spectrogram of a harpsichord tone or that of a transition between two pronounced vowels. Whereas the standard non-negative matrix factorization (NMF) assumes fixed spectral atoms, an extension is proposed where the temporal activations (coefficients of the decomposition on the spectral atom basis) become frequency dependent and follow a time-varying autoregressive moving average (ARMA) modeling. This extension can thus be interpreted with the help of a source/filter paradigm and is referred to as source/filter factorization. This factorization leads to an efficient single-atom decomposition for a single audio event with strong spectral variation (but with constant pitch). The new algorithm is tested on real audio data and shows promising results.

[1] Jouni Paulus,et al. Drum transcription with non-negative spectrogram factorisation , 2005, 2005 13th European Signal Processing Conference.

[2] Raul Kompass,et al. A Generalized Divergence Measure for Nonnegative Matrix Factorization , 2007, Neural Computation.

[3] Emmanuel Vincent,et al. Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Inderjit S. Dhillon,et al. Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[5] Emmanuel Vincent,et al. Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Alexey Ozerov,et al. Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Roland Badeau,et al. Weighted maximum likelihood autoregressive and moving average spectrum modeling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[9] Roland Badeau,et al. Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10] Morten Mørup,et al. Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[11] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[12] Gaël Richard,et al. Singer melody extraction in polyphonic signals using source separation methods , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13] Mark D. Plumbley,et al. Unsupervised analysis of polyphonic music by sparse coding , 2006, IEEE Transactions on Neural Networks.

[14] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[15] Paris Smaragdis,et al. Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[16] Gaël Richard,et al. An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17] D. Fitzgerald,et al. Non-negative Tensor Factorisation for Sound Source Separation , 2005 .

[18] Andrzej Cichocki,et al. Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms , 2006, ICA.

[19] P. Smaragdis,et al. Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[20] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[21] Bhiksha Raj,et al. Adobe Systems , 1998 .

[22] Emmanuel Vincent,et al. Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[23] Rémi Gribonval,et al. Harmonic decomposition of audio signals with matching pursuit , 2003, IEEE Trans. Signal Process..

[24] Emmanuel Vincent,et al. Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[25] Hiroshi Sawada,et al. Audio source separation based on independent component analysis , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[26] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.