NMF With Time–Frequency Activations to Model Nonstationary Audio Events

Real-world sounds often exhibit time-varying spectral shapes, as observed in the spectrogram of a harpsichord tone or that of a transition between two pronounced vowels. Whereas the standard non-negative matrix factorization (NMF) assumes fixed spectral atoms, an extension is proposed where the temporal activations (coefficients of the decomposition on the spectral atom basis) become frequency dependent and follow a time-varying autoregressive moving average (ARMA) modeling. This extension can thus be interpreted with the help of a source/filter paradigm and is referred to as source/filter factorization. This factorization leads to an efficient single-atom decomposition for a single audio event with strong spectral variation (but with constant pitch). The new algorithm is tested on real audio data and shows promising results.

[1]  Jouni Paulus,et al.  Drum transcription with non-negative spectrogram factorisation , 2005, 2005 13th European Signal Processing Conference.

[2]  Raul Kompass,et al.  A Generalized Divergence Measure for Nonnegative Matrix Factorization , 2007, Neural Computation.

[3]  Emmanuel Vincent,et al.  Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[5]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Roland Badeau,et al.  Weighted maximum likelihood autoregressive and moving average spectrum modeling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[9]  Roland Badeau,et al.  Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[11]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[12]  Gaël Richard,et al.  Singer melody extraction in polyphonic signals using source separation methods , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Mark D. Plumbley,et al.  Unsupervised analysis of polyphonic music by sparse coding , 2006, IEEE Transactions on Neural Networks.

[14]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[15]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[16]  Gaël Richard,et al.  An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  D. Fitzgerald,et al.  Non-negative Tensor Factorisation for Sound Source Separation , 2005 .

[18]  Andrzej Cichocki,et al.  Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms , 2006, ICA.

[19]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[20]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[21]  Bhiksha Raj,et al.  Adobe Systems , 1998 .

[22]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Rémi Gribonval,et al.  Harmonic decomposition of audio signals with matching pursuit , 2003, IEEE Trans. Signal Process..

[24]  Emmanuel Vincent,et al.  Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Hiroshi Sawada,et al.  Audio source separation based on independent component analysis , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[26]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.