Nonnegative Tensor Factorization with Frequency Modulation Cues for Blind Audio Source Separation

We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency modulation as cues in the separation. This permits the modeling and unsupervised separation of vibrato or glissando musical sources, which is not possible with the basic matrix factorization formulation. The algorithm factorizes a sparse nonnegative tensor comprising the audio spectrogram and local frequency-slope-to-frequency ratios, which are estimated at each time-frequency bin using the Distributed Derivative Method. The use of local frequency modulations as separation cues is motivated by the principle of common fate partial grouping from Auditory Scene Analysis, which hypothesizes that each latent source in a mixture is characterized perceptually by coherent frequency and amplitude modulations shared by its component partials. We derive multiplicative factor updates by Minorization-Maximization, which guarantees convergence to a local optimum by iteration. We then compare our method to the baseline on two separation tasks: one considers synthetic vibrato notes, while the other considers vibrato string instrument recordings.

[1]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[2]  Paris Smaragdis,et al.  Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view , 2014, IEEE Signal Processing Magazine.

[3]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[4]  Tom Barker,et al.  Non-negative tensor factorisation of modulation spectrograms for monaural sound source separation , 2013, INTERSPEECH.

[5]  Paris Smaragdis,et al.  Directional NMF for joint source localization and separation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[6]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  DeLiang Wang,et al.  Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[9]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[10]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Philippe Depalle,et al.  A unified view of non-stationary sinusoidal parameter estimation methods using signal derivatives , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Michaël Betser,et al.  Sinusoidal Polynomial Parameter Estimation Using the Distribution Derivative , 2009, IEEE Transactions on Signal Processing.

[13]  S. McAdams Segregation of concurrent sounds. I: Effects of frequency modulation coherence. , 1989, The Journal of the Acoustical Society of America.

[14]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[15]  P. Depalle,et al.  Perceptual Evaluation of Vibrato Models , 2005 .

[16]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[17]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[18]  A. L. Wang Instantaneous and frequency-warped techniques for source separation and signal parametrization , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[19]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[20]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Elliot Creager,et al.  Musical source separation by coherent frequency modulation cues , 2016 .

[22]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[23]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.