Probabilistic Modeling Paradigms for Audio Source Separation

Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, we focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. We show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. We compare the merits of either paradigm and report objective performance figures. We conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Roger K. Moore Computer Speech and Language , 1986 .

[3]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[4]  Jean-François Cardoso,et al.  Multidimensional independent component analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[6]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[7]  Barak A. Pearlmutter,et al.  Independent Component Analysis: Blind source separation by sparse decomposition in a signal dictionary , 2001 .

[8]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[9]  Jean-Francois Cardoso,et al.  THE THREE EASY ROUTES TO INDEPENDENT COMPONENT ANALYSIS; CONTRASTS AND GEOMETRY , 2001 .

[10]  Harry L. Van Trees,et al.  Optimum Array Processing , 2002 .

[11]  Hagai Attias,et al.  New EM algorithms for source separation and deconvolution with a microphone array , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Shubha Kadambe,et al.  A probabilistic approach for blind source separation of underdetermined convolutive mixtures , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Te-Won Lee,et al.  A Maximum Likelihood Approach to Single-channel Source Separation , 2003, J. Mach. Learn. Res..

[14]  Nikolaos Mitianoudis,et al.  Audio source separation of convolutive mixtures , 2003, IEEE Trans. Speech Audio Process..

[15]  Dinh-Tuan Pham,et al.  Blind separation of speech mixtures based on nonstationarity , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[16]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[17]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[19]  Hiroshi Sawada,et al.  Blind Source Separation for MOving Speech Signals Using Blockwise ICA and Residual Crosstalk Subtraction , 2004 .

[20]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[21]  J. Cardoso,et al.  Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[22]  Daniel P. W. Ellis,et al.  Model-Based Scene Analysis , 2005 .

[23]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[24]  W. Bastiaan Kleijn,et al.  Codebook driven short-term predictor parameter estimation for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Daniel P. W. Ellis,et al.  An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments , 2006, NIPS.

[26]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  JOINT ACOUSTIC SOURCE LOCATION AND ORIENTATION ESTIMATION USING SEQUENTIAL MONTE CARLO , 2006 .

[29]  John R. Hershey,et al.  Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system , 2006, INTERSPEECH.

[30]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[31]  Emmanuel Vincent,et al.  Musical source separation using time-frequency source priors , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[35]  Te-Won Lee,et al.  Independent Vector Analysis for Convolutive Blind Speech Separation , 2007, Blind Speech Separation.

[36]  Ahmed H. Tewfik,et al.  Two Improved Sparse Decomposition Methods for Blind Source Separation , 2007, ICA.

[37]  Emmanuel Vincent,et al.  Complex Nonconvex l p Norm Minimization for Underdetermined Source Separation , 2007, ICA.

[38]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Simon J. Godsill,et al.  Variational and stochastic inference for Bayesian source separation , 2007, Digit. Signal Process..

[40]  Mikkel N. Schmidt,et al.  Linear Regression on Sparse Features for Single-Channel Speech Separation , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[41]  Volker Hohmann,et al.  Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Shigeki Sagayama,et al.  Sparseness-Based 2CH BSS using the EM Algorithm in Reverberant Environment , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[43]  Hiroshi Sawada,et al.  MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization , 2007, EURASIP J. Adv. Signal Process..

[44]  John R. Hershey,et al.  Efficient model-based speech separation and denoising using non-negative subspace analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  Daniel P. W. Ellis,et al.  Source separation based on binaural cues and source model constraints , 2008, INTERSPEECH.

[46]  P. Svaizer,et al.  Separating Short Signals in Highly Reverberant Environment by a Recursive Frequency-Domain BSS , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[47]  Alan Wee-Chung Liew,et al.  Visual Speech Recognition: Lip Segmentation and Mapping , 2008 .

[48]  Derry Fitzgerald,et al.  Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation , 2008, Comput. Intell. Neurosci..

[49]  Guy Rapaport,et al.  Evaluation of several strategies for single sensor speech/music separation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50]  DeLiang Wang,et al.  Binaural tracking of multiple moving sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[51]  Barak A. Pearlmutter,et al.  The LOST Algorithm: Finding Lines and Separating Speech Mixtures , 2008, EURASIP J. Adv. Signal Process..

[52]  Rémi Gribonval,et al.  Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation , 2009, ICA.

[53]  Say Wei Foo,et al.  Hidden Markov Model Based Visemes Recognition, Part I: AdaBoost Approach , 2009 .

[54]  Alexey Ozerov,et al.  Multichannel nonnegative matrix factorization in convolutive mixtures. With application to blind audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[55]  Emmanuel Vincent,et al.  Validity of the Independence Assumption for the Separation of Instantaneous and Convolutive Mixtures of Speech and Music Sources , 2009, ICA.

[56]  Emmanuel Vincent,et al.  Extension of Sparse, Adaptive Signal Decompositions to Semi-blind Audio Source Separation , 2009, ICA.

[57]  Ali Taylan Cemgil,et al.  Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation , 2009, ICA.

[58]  Rémi Gribonval,et al.  Spatial covariance models for under-determined reverberant audio source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[59]  Rémi Gribonval,et al.  Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling , 2009, ICA.

[60]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[61]  Intae Lee,et al.  Permutation Correction in Blind Source Separation Using Sliding Subband Likelihood Function , 2009, ICA.

[62]  Daniel P. W. Ellis,et al.  Speech separation using speaker-adapted eigenvoice speech models , 2010, Comput. Speech Lang..

[63]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[64]  Sivaji Bandyopadhyay,et al.  Emerging Applications of Natural Language Processing: Concepts and New Research , 2012 .