Spatial location priors for Gaussian model based reverberant audio source separation

We consider the Gaussian framework for reverberant audio source separation, where the sources are modeled in the time-frequency domain by their short-term power spectra and their spatial covariance matrices. We propose two alternative probabilistic priors over the spatial covariance matrices which are consistent with the theory of statistical room acoustics and we derive expectation-maximization algorithms for maximum a posteriori (MAP) estimation. We argue that these algorithms provide a statistically principled solution to the permutation problem and to the risk of overfitting resulting from conventional maximum likelihood (ML) estimation. We show experimentally that in a semi-informed scenario where the source positions and certain room characteristics are known, the MAP algorithms outperform their ML counterparts. This opens the way to rigorous statistical treatment of this family of models in other scenarios in the future.

[1]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2]  Mike Brookes,et al.  Performance Comparison of Algorithms for Blind Reverberation Time Estimation from Speech , 2012, IWAENC.

[3]  Shoko Araki,et al.  Geometrically Constrained Independent Component Analysis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  J. Cardoso,et al.  Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[5]  Roland Maas,et al.  A stereophonic acoustic signal extraction scheme for noisy and reverberant environments , 2013, Comput. Speech Lang..

[6]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[7]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Masataka Goto,et al.  Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  D. Kraus,et al.  Calculation of moments of complex Wishart and complex inverse Wishart distributed matrices , 2000 .

[10]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[11]  Nicholas J. Higham,et al.  Solving a Quadratic Matrix Equation by Newton's Method with Exact Line Searches , 2001, SIAM J. Matrix Anal. Appl..

[12]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[13]  Barak A. Pearlmutter,et al.  Survey of sparse and non‐sparse methods in source separation , 2005, Int. J. Imaging Syst. Technol..

[14]  Simon J. Godsill,et al.  Variational and stochastic inference for Bayesian source separation , 2007, Digit. Signal Process..

[15]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[16]  F. Itakura,et al.  Balancing acoustic and linguistic probabilities , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[19]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[21]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[22]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Kevin H. Knuth A Bayesian approach to source separation , 1999 .

[24]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Ali Taylan Cemgil,et al.  Gamma Markov Random Fields for Audio Source Modeling , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Hiroshi Sawada,et al.  MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization , 2007, EURASIP J. Adv. Signal Process..

[28]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using Local Observed Covariance and Auditory-Motivated Time-Frequency Representation , 2010, LVA/ICA.

[29]  BertinNancy,et al.  Nonnegative matrix factorization with the itakura-saito divergence , 2009 .

[30]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[31]  Shigeki Sagayama,et al.  Blind Estimation of Locations and Time Offsets for Distributed Recording Devices , 2010, LVA/ICA.

[32]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Rémi Gribonval,et al.  Spatial covariance models for under-determined reverberant audio source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[34]  Rémi Gribonval,et al.  An acoustically-motivated spatial prior for under-determined reverberant source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[36]  Hiroshi Sawada,et al.  Bayesian Unification of Sound Source Localization and Separation with Permutation Resolution , 2012, AAAI.

[37]  Christopher V. Alvino,et al.  Geometric source separation: merging convolutive source separation with geometric beamforming , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[38]  Jean-François Cardoso,et al.  Multidimensional independent component analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[39]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[40]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[41]  Hiroshi Sawada,et al.  Frequency-Domain Blind Source Separation , 2007, Blind Speech Separation.

[42]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[43]  Wenwu Wang,et al.  Machine Audition: Principles, Algorithms and Systems , 2010 .