Non-negative tensor factorization models for Bayesian audio processing

We provide an overview of matrix and tensor factorization methods from a Bayesian perspective, giving emphasis on both the inference methods and modeling techniques. Factorization based models and their many extensions such as tensor factorizations have proved useful in a broad range of applications, supporting a practical and computationally tractable framework for modeling. Especially in audio processing, tensor models help in a unified manner the use of prior knowledge about signals, the data generation processes as well as available data from different modalities. After a general review of tensor models, we describe the general statistical framework, give examples of several audio applications and describe modeling strategies for key problems such as deconvolution, source separation, and transcription.

[1]  Jonathan Le Roux,et al.  Hierarchical and coupled non-negative dynamical systems with application to audio modeling , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[2]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[5]  Walter Willinger,et al.  Spatio-temporal compressive sensing and internet traffic matrices , 2009, SIGCOMM '09.

[6]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[7]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[8]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[9]  Mikkel N. Schmidt Single-Channel Speech Separation usin , 2006 .

[10]  Rasmus Bro,et al.  Coupled Matrix Factorization with Sparse Factors to Identify Potential Biomarkers in Metabolomics , 2012, Int. J. Knowl. Discov. Bioinform..

[11]  Roland Badeau Gaussian modeling of mixtures of non-stationary signals in the Time-Frequency domain (HR-NMF) , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[12]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[13]  S. Amari,et al.  Nonnegative Matrix and Tensor Factorization [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[14]  Paris Smaragdis,et al.  Prediction based filtering and smoothing to exploit temporal dependencies in NMF , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Antoine Liutkus,et al.  Coding-Based Informed Source Separation: Nonnegative Tensor Factorization Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[17]  Maurizio Dapor Monte Carlo Strategies , 2020, Transport of Energetic Electrons in Solids.

[18]  Ali Taylan Cemgil,et al.  Probabilistic latent tensor factorization framework for audio modeling , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[19]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[20]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[21]  Ananda Sen,et al.  The Theory of Dispersion Models , 1997, Technometrics.

[22]  Xing Xie,et al.  Collaborative Filtering Meets Mobile Recommendation: A User-Centered Approach , 2010, AAAI.

[23]  Ali Taylan Cemgil,et al.  Markov Chain Monte Carlo inference for probabilistic latent tensor factorization , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[24]  Ali Taylan Cemgil,et al.  Score guided audio restoration via generalised coupled tensor factorisation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Bhiksha Raj,et al.  Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Tuomas Virtanen,et al.  Learning state labels for sparse classification of speech with matrix deconvolution , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[27]  Ali Taylan Cemgil,et al.  Learning mixed divergences in coupled matrix and tensor factorization models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[29]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Bhiksha Raj,et al.  Latent-variable decomposition based dereverberation of monaural and multi-channel signals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  D. Fitzgerald,et al.  Non-negative Tensor Factorisation for Sound Source Separation , 2005 .

[33]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[34]  Tuomas Virtanen,et al.  Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition , 2011, INTERSPEECH.

[35]  Alexey Ozerov,et al.  Text-informed audio source separation using nonnegative matrix partial co-factorization , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[36]  Derry Fitzgerald,et al.  Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation , 2008, Comput. Intell. Neurosci..

[37]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[38]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[39]  David Laurenson,et al.  Estimating clean speech thresholds for perceptual based speech enhancement , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[40]  Onur Dikmen,et al.  Sound event detection using non-negative dictionaries learned from annotated overlapping events , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[41]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[42]  Björn W. Schuller,et al.  Real-Time Speech Separation by Semi-supervised Nonnegative Matrix Factorization , 2012, LVA/ICA.

[43]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[44]  Mark D. Plumbley,et al.  Multichannel HR-NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[45]  Maurice Charbit,et al.  Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[46]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[48]  Ali Taylan Cemgil,et al.  Score guided musical source separation using Generalized Coupled Tensor Factorization , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[49]  Paris Smaragdis,et al.  A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Bhiksha Raj,et al.  Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.

[51]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[53]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[54]  Ali Taylan Cemgil,et al.  Generalised Coupled Tensor Factorisation , 2011, NIPS.

[55]  O. Cappé,et al.  Efficient Markov chain Monte Carlo inference in composite models with space alternating data augmentation , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[56]  A. Klapuri,et al.  Analysis of polyphonic audio using source-filter model and non-negative matrix factorization , 2006 .

[57]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[58]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[59]  R. A. van den Berg,et al.  Simultaneous analysis of coupled data matrices subject to different amounts of noise. , 2011, The British journal of mathematical and statistical psychology.

[60]  Haesun Park,et al.  Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons , 2011, SIAM J. Sci. Comput..

[61]  Ngoc Q. K. Duong,et al.  On monotonicity of multiplicative update rules for weighted nonnegative tensor factorization , 2014 .

[62]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[63]  Tuomas Virtanen,et al.  Separation of sound sources by convolutive sparse coding , 2004, SAPA@INTERSPEECH.

[64]  Hirokazu Kameoka,et al.  I-Divergence-based dereverberation method with auxiliary function approach , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[65]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[66]  Hugo Van hamme A diagonalized newton algorithm for non-negative sparse coding , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[67]  Jonathan Le Roux,et al.  Non-negative dynamical system with application to speech and audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[68]  Ali Taylan Cemgil,et al.  Nonnegative matrix factorizations as probabilistic inference in composite models , 2009, 2009 17th European Signal Processing Conference.

[69]  Hirokazu Kameoka,et al.  Computational auditory induction as a missing-data model-fitting problem with Bregman divergence , 2011, Speech Commun..

[70]  Olivier Cappé,et al.  Piecewise constant nonnegative matrix factorization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[71]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[72]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[73]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[74]  Michael W. Berry,et al.  Email Surveillance Using Non-negative Matrix Factorization , 2005, Comput. Math. Organ. Theory.

[75]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[76]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[77]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[78]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[79]  Tuomas Virtanen,et al.  Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation , 2010 .

[80]  Hirokazu Kameoka,et al.  Formulations and algorithms for multichannel complex NMF , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[81]  BertinNancy,et al.  Nonnegative matrix factorization with the itakura-saito divergence , 2009 .

[82]  Tuomas Virtanen,et al.  Multichannel Audio Upmixing by Time-Frequency Filtering Using Non-Negative Tensor Factorization , 2012 .