Chapter 2 Non-negative Matrix Factorization and Its Variants for Audio Signal Processing

In this chapter, I briefly introduce a multivariate analysis technique called non-negative matrix factorization (NMF), which has attracted a lot of attention in the field of audio signal processing in recent years. I will mention some basic properties of NMF, effects induced by the non-negative constraints, how to derive an iterative algorithm for NMF, and some attempts that have been made to apply NMF to audio processing problems.

[1]  Tuomas Virtanen,et al.  Non-negative matrix deconvolution in noise robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[3]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hirokazu Kameoka,et al.  A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models , 2014, INTERSPEECH.

[5]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[6]  Hirokazu Kameoka,et al.  A sparse component model of source signals and its application to blind source separation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Bhiksha Raj,et al.  Example-Driven Bandwidth Expansion , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[8]  Masataka Goto,et al.  A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Hirokazu Kameoka,et al.  Constrained and regularized variants of non-negative matrix factorization incorporating music-specific constraints , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  Daniel P. W. Ellis,et al.  Beta Process Sparse Nonnegative Matrix Factorization for Music , 2013, ISMIR.

[12]  Perry R. Cook,et al.  Bayesian Nonparametric Matrix Factorization for Recorded Music , 2010, ICML.

[13]  Maurice Charbit,et al.  Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Paris Smaragdis,et al.  Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view , 2014, IEEE Signal Processing Magazine.

[15]  Irfan A. Essa,et al.  Phase-Aware Non-negative Spectrogram Factorization , 2007, ICA.

[16]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[18]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[19]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[20]  Hirokazu Kameoka,et al.  Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[22]  Hirokazu Kameoka,et al.  Selective Amplifier of Periodic and Non-periodic Components in Concurrent Audio Signals with Spectral Control Envelopes , 2006 .

[23]  Sadao Hiroya,et al.  Non-Negative Temporal Decomposition of Speech Parameters by Multiplicative Update Rules , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Hirokazu Kameoka,et al.  New formulations and efficient algorithms for multichannel NMF , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[25]  Emmanuel Vincent,et al.  Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Hirokazu Kameoka,et al.  Composite autoregressive system for sparse source-filter representation of speech , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[29]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[31]  Ole Winther,et al.  Bayesian Non-negative Matrix Factorization , 2009, ICA.

[32]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[35]  Roland Badeau,et al.  Beta-Divergence as a Subclass of Bregman Divergence , 2011, IEEE Signal Processing Letters.

[36]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[37]  Hirokazu Kameoka,et al.  Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[39]  D. Hunter,et al.  Quantile Regression via an MM Algorithm , 2000 .

[40]  Jean-Philippe Thiran,et al.  Sparse non-negative decomposition of speech power spectra for formant tracking , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  A. Klapuri,et al.  Analysis of polyphonic audio using source-filter model and non-negative matrix factorization , 2006 .

[42]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[43]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[44]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[45]  Hirokazu Kameoka,et al.  Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).