Glimpsing IVA: A Framework for Overcomplete/Complete/Undercomplete Convolutive Source Separation

Independent vector analysis (IVA) is a method for separating convolutedly mixed signals that significantly reduces the occurrence of the well-known permutation problem in frequency domain blind source separation (BSS). In this paper, we develop a novel IVA-based unifying framework for overcomplete/complete/undercomplete convolutive noisy BSS. We show that in order for the sources to be separable in the frequency domain, they must have a temporal dynamic structure. We exploit a common form of dynamics, especially present in speech, wherein the signals have silence periods intermittently, hence varying the set of active sources with time. This feature is extremely useful in dealing with overcomplete situations. An approach using hidden Markov models (HMMs) is proposed that takes advantage of different combinations of silence gaps of the source signals at each time period. This enables the algorithm to “glimpse” or listen in the gaps, hence compensating for the global degeneracy by allowing it to learn the mixing matrices at periods where it is locally less degenerate. The same glimpsing strategy can be employed to the complete/undercomplete case as well. Moreover, additive noise is considered in our model. Real and simulated experiments were carried out for overcomplete convoluted mixtures of speech signals yielding improved separation results compared to a sparsity-based robust time-frequency masking method. Signal-to-disturbance ratio (SDR) and machine intelligibility of a speech recognizer was used to evaluate their performances. Experiments were also conducted for the classical complete setting using the proposed algorithm and compared with standard IVA showing that the results compare favorably.

[1]  Kenneth Kreutz-Delgado,et al.  Probabilistic Formulation of Independent Vector Analysis Using Complex Gaussian Scale Mixtures , 2009, ICA.

[2]  Shin Ishii,et al.  Markov and Semi-Markov Switching of Source Appearances for Nonstationary Independent Component Analysis , 2007, IEEE Transactions on Neural Networks.

[3]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[4]  Lars Kai Hansen,et al.  Blind Separation of More Sources than Sensors in Convolutive Mixtures , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[6]  Dinh-Tuan Pham,et al.  Blind separation of instantaneous mixtures of nonstationary sources , 2001, IEEE Trans. Signal Process..

[7]  Jiucang Hao,et al.  Adaptive independent vector analysis for the separation of convoluted mixtures using EM algorithm , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Yannick Deville,et al.  A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources , 2005, Signal Process..

[9]  Barak A. Pearlmutter,et al.  Soft-LOST: EM on a Mixture of Oriented Lines , 2004, ICA.

[10]  Hiroshi Sawada,et al.  Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Te-Won Lee,et al.  On the multivariate Laplace distribution , 2006, IEEE Signal Processing Letters.

[12]  L. Vielva,et al.  UNDERDETERMINED BLIND SOURCE SEPARATION USING A PROBABILISTIC SOURCE SPARSITY MODEL , 2001 .

[13]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[14]  Barak A. Pearlmutter,et al.  Hard-LOST: modified k-means for oriented lines , 2004 .

[15]  Atsuo Hiroe,et al.  Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions , 2006, ICA.

[16]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[17]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[18]  Nikolaos Mitianoudis,et al.  Batch and Online Underdetermined Source Separation Using Laplacian Mixture Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Hiroshi Sawada,et al.  MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization , 2007, EURASIP J. Adv. Signal Process..

[20]  Martin Cooke,et al.  Glimpsing speech , 2003, J. Phonetics.

[21]  Martin Cooke,et al.  Making Sense of Everyday Speech: a Glimpsing Account , 2005, Speech Separation by Humans and Machines.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[24]  M. West On scale mixtures of normal distributions , 1987 .

[25]  Hiroshi Sawada,et al.  Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and L1-Norm Minimization , 2007, Blind Speech Separation.

[26]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[27]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[28]  Te-Won Lee,et al.  Multivariate Scale Mixture of Gaussians Modeling , 2006, ICA.

[29]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[30]  Bhaskar D. Rao,et al.  Independent vector analysis incorporating active and inactive states , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  DeLiang Wang,et al.  Separating Underdetermined Convolutive Speech Mixtures , 2006, ICA.

[32]  Lars Kai Hansen,et al.  Probabilistic blind deconvolution of non-stationary sources , 2004, 2004 12th European Signal Processing Conference.

[33]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[34]  David R. Brillinger,et al.  Time Series: Data Analysis and Theory. , 1982 .

[35]  Birger Kollmeier,et al.  Amplitude Modulation Decorrelation For Convolutive Blind Source Separation , 2000 .

[36]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[37]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[38]  N. Mitianoudis,et al.  Simple mixture model for sparse overcomplete ICA , 2004 .

[39]  Mineichi Kudo,et al.  Performance analysis of minimum /spl lscr//sub 1/-norm solutions for underdetermined source separation , 2004, IEEE Transactions on Signal Processing.

[40]  Dinh-Tuan Pham,et al.  Blind separation of speech mixtures based on nonstationarity , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[41]  Yannick Deville,et al.  Temporal and time-frequency correlation-based blind source separation methods. Part I: Determined and underdetermined linear instantaneous mixtures , 2007, Signal Process..

[42]  L. Parra,et al.  Independent Component Analysis: Separation of non-stationary natural signals , 2001 .

[43]  Hiroshi Sawada,et al.  Underdetermined blind separation for speech in real environments with sparseness and ICA , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Hiroshi Sawada,et al.  A Two-Stage Frequency-Domain Blind Source Separation Method for Underdetermined Convolutive Mixtures , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[45]  Hiroshi Sawada,et al.  Measuring Dependence of Bin-wise Separated Signals for Permutation Alignment in Frequency-domain BSS , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[46]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[47]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.

[48]  Hiroshi Sawada,et al.  A NOVEL BLIND SOURCE SEPARATION METHOD WITH OBSERVATION VECTOR CLUSTERING , 2005 .

[49]  Te-Won Lee,et al.  Blind Source Separation Exploiting Higher-Order Frequency Dependencies , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[50]  K. Matsuoka,et al.  Minimal distortion principle for blind source separation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[51]  Wenyi Zhang Microphone array processing for speech : dual channel localization, robust beamforming, and ICA analysis , 2010 .