Independent vector analysis for source separation using an energy driven mixed student's T and super Gaussian source prior

Independent vector analysis (IVA) can thoretically avoid the permutation problem in frequency domain blind source separation by using a multivariate source prior to retain the dependency between different frequency bins of each source. The performance of the IVA method is however very dependent upon the choice of source prior. Recently, a fixed combination of the original super Gaussian, previously used in the IVA method, and the Student's t distributions has been found to offer performance improvement; but due to the non-stationary nature of speech, this combination should adapt to the statistical properties of the measured speech mixtures. Therefore, in this work we propose a new energy driven mixed multivariate Student's t and super Gaussian source prior for the IVA algorithm. For further performance improvement, the clique based IVA method is used to exploit the strong dependency between neighbouring frequency components. This new algorithm is evaluated on mixtures formed from speech signals from the TIMIT dataset and real room impulse responses and performance improvement is demonstrated over the conventional IVA method with fixed source prior.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Te-Won Lee,et al.  Independent Vector Analysis: Definition and Algorithms , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[3]  Jin-Jang Leou,et al.  Saliency-directed color image segmentation using modified particle swarm optimization , 2012, Signal Process..

[4]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[5]  Gaojie Chen,et al.  Independent vector analysis with multivariate student's t-distribution source prior for speech separation , 2013 .

[6]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[7]  Israel Cohen,et al.  Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation , 2005, Speech Commun..

[8]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Te-Won Lee,et al.  Blind Source Separation Exploiting Higher-Order Frequency Dependencies , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Te-Won Lee,et al.  On the Assumption of Spherical Symmetry and Sparseness for the Frequency-Domain Speech Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Jonathon A. Chambers,et al.  Independent vector analysis with a multivariate generalized gaussian source prior for frequency domain blind source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Lucas C. Parra,et al.  A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .

[13]  Barbara G Shinn-Cunningham,et al.  Localizing nearby sound sources in a classroom: binaural room impulse responses. , 2005, The Journal of the Acoustical Society of America.

[14]  Christopher Hummersone,et al.  A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments , 2011 .

[15]  Gil-Jin Jang,et al.  Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation , 2012, EURASIP J. Adv. Signal Process..

[16]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[17]  Andrzej Cichocki,et al.  Adaptive blind signal and image processing , 2002 .

[18]  Simon Haykin,et al.  The Cocktail Party Problem , 2005, Neural Computation.

[19]  Aapo Hyvärinen,et al.  A Fast Fixed-Point Algorithm for Independent Component Analysis of Complex Valued Signals , 2000, Int. J. Neural Syst..

[20]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[21]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[22]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[23]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .