Shifted and convolutive source-filter non-negative matrix factorization for monaural audio source separation

This paper proposes an extension of non-negative matrix factorization (NMF), which combines the shifted NMF model with the source-filter model. Shifted NMF was proposed as a powerful approach for monaural source separation and multiple fundamental frequency (F0) estimation, which is particularly unique in that it takes account of the constant inter-harmonic spacings of a harmonic structure in log-frequency representations and uses a shifted copy of a spectrum template to represent the spectra of different F0s. However, for those sounds that follow the source-filter model, this assumption does not hold in reality, since the filter spectra are usually invariant under F0 changes. A more reasonable way to represent the spectrum of a different F0 is to use a shifted copy of a harmonic structure template as the excitation spectrum and keep the filter spectrum fixed. Thus, we can describe the spectrogram of a mixture signal as the sum of the products between the shifted copies of excitation spectrum templates and filter spectrum templates. Furthermore, the time course of filter spectra represents the dynamics of the timbre, which is important for characterizing the feature of an instrument sound. Thus, we further incorporate the non-negative matrix factor deconvolution (NMFD) model into the above model to describe the filter spectrogram. We derive a computationally efficient and convergence-guaranteed algorithm for estimating the unknown parameters of the constructed model based on the auxiliary function approach. Experimental results revealed that the proposed method outperformed shifted NMF in terms of the source separation accuracy.

[1]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[2]  Tuomas Virtanen,et al.  Unsupervised Learning Methods for Source Separation in Monaural Music Signals , 2006 .

[3]  Kameoka Hirokazu Statistical speech spectrum model incorporating all-pole vocal tract model and F_0 contour generating process model , 2010 .

[4]  亀岡 弘和,et al.  Statistical approach to multipitch analysis , 2007 .

[5]  施晓妍 Signal processing method and device , 2012 .

[6]  Roland Badeau,et al.  ON AUDIO , SPEECH , AND LANGUAGE PROCESSING 1 Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription , 2013 .

[7]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[8]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  S. Rickard,et al.  Towards shifted NMF for improved monaural separation , 2013 .

[10]  Tuomas Virtanen,et al.  Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[11]  Bhiksha Raj,et al.  Adobe Systems , 1998 .

[12]  Hirokazu Kameoka,et al.  Specmurt anasylis: a piano-roll-visualization of polyphonic music signal by deconvolution of log-frequency spectrum , 2004, SAPA@INTERSPEECH.

[13]  Anssi Klapuri,et al.  Missing template estimation for user-assisted music transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[15]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[16]  D. Hunter,et al.  Quantile Regression via an MM Algorithm , 2000 .

[17]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Hirokazu Kameoka,et al.  Fast Signal Reconstruction from Magnitude Spectrogram of Continuous Wavelet Transform Based on Spectrogram Consistency , 2014, DAFx.

[19]  Emmanuel Vincent,et al.  Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Hirokazu Kameoka,et al.  Composite autoregressive system for sparse source-filter representation of speech , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[22]  Anssi Klapuri,et al.  Analysis of Musical Instrument Sounds by Source-Filter-Decay Model , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[23]  D. Fitzgerald,et al.  Shifted non-negative matrix factorisation for sound source separation , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[24]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[25]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[26]  Hirokazu Kameoka,et al.  Harmonic-Temporal Factor Decomposition Incorporating Music Prior Information for Informed Monaural Source Separation , 2014, ISMIR.

[27]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.