Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking

This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have dominant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the position and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new sophisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room, whose reverberation time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method

[1]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[4]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[5]  Nobuhiko Kitawaki,et al.  Combined approach of array processing and independent component analysis for blind separation of acoustic signals , 2003, IEEE Trans. Speech Audio Process..

[6]  Kiyohiro Shikano,et al.  Blind Source Separation Combining Independent Component Analysis and Beamforming , 2003, EURASIP J. Adv. Signal Process..

[7]  Aapo Hyvärinen,et al.  A Fast Fixed-Point Algorithm for Independent Component Analysis of Complex Valued Signals , 2000, Int. J. Neural Syst..

[8]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[9]  S. Rickard,et al.  REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION , 2001 .

[10]  CookeMartin,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001 .

[11]  David G. Stork,et al.  Pattern Classification , 1973 .

[12]  Hiroshi Sawada,et al.  Frequency-Domain Blind Source Separation , 2007, Blind Speech Separation.

[13]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[14]  Daniel W. E. Schobben,et al.  A frequency domain blind signal separation method based on decorrelation , 2002, IEEE Trans. Signal Process..

[15]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[16]  Daniel P. W. Ellis,et al.  PREDICTION-DRIVEN COMPUTATIONAL AUDITORY SCENE ANALYSIS FOR DENSE SOUND MIXTURES , 1996 .

[17]  Hiroshi Sawada,et al.  Frequency domain blind source separation using small and large spacing sensor pairs , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[18]  S. Rickard,et al.  DESPRIT - histogram based blind source separation of more sources than sensors using subspace methods , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[19]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[20]  DeLiang Wang,et al.  Binaural sound segregation for multisource reverberant environments , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[22]  Sven Nordholm,et al.  Convolutive blind signal separation with post-processing , 2004, IEEE Transactions on Speech and Audio Processing.

[23]  Andrzej Cichocki,et al.  Adaptive blind signal and image processing , 2002 .

[24]  Hiroshi Sawada,et al.  Polar coordinate based nonlinear function for frequency-domain blind source separation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Dorothea Kolossa,et al.  Nonlinear Postprocessing for Blind Speech Separation , 2004, ICA.

[26]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[27]  Yutaka Kaneda,et al.  Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones , 2001 .

[28]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[29]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[30]  Hiroshi Sawada,et al.  Removal of residual crosstalk components in blind source separation using LMS filters , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[31]  Hiroshi Sawada,et al.  Underdetermined blind separation for speech in real environments with sparseness and ICA , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Arthur H. M. van Roermund,et al.  Unsupervised adaptive filtering, volume I: blind source separation [Book Review] , 2002, IEEE Circuits and Devices Magazine.

[33]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[34]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[35]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.