A comparison of auditory and blind separation techniques for speech segregation

A fundamental problem in auditory and speech processing is the segregation of speech from concurrent sounds. This problem has been a focus of study in computational auditory scene analysis (CASA), and it has also been investigated from the perspective of blind source separation. Using a standard corpus of voiced speech mixed with interfering sounds, we report a comparison between CASA and blind source separation techniques, which have been developed independently. Our comparison reveals that they perform well under very different conditions. A number of conclusions are drawn with respect to their relative strengths and weaknesses in speech segregation applications as well as in modeling auditory function.

[1]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[2]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[3]  Jean-Franois Cardoso High-Order Contrasts for Independent Component Analysis , 1999, Neural Computation.

[4]  A.J.W. van der Kouwe,et al.  Temporal alignment, spatial spread and the linear independence criterion for blind separation of voices , 1997, Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 'Magnificent Milestones and Emerging Opportunities in Medical Engineering' (Cat. No.97CH36136).

[5]  Tomohiro Nakatani,et al.  A computational model of sound stream segregation with multi-agent paradigm , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[7]  S. Amari,et al.  Natural Gradient Approach To Blind Separation Of Over- And Under-Complete Mixtures , 1999 .

[8]  Jie Huang,et al.  Echo avoidance in a computational model of the precedence effect , 1999, Speech Commun..

[9]  Andrzej Cichocki,et al.  Information-theoretic approach to blind separation of sources in non-linear mixture , 1998, Signal Process..

[10]  Ruey-Wen Liu,et al.  General approach to blind source separation , 1996, IEEE Trans. Signal Process..

[11]  Tomohiro Nakatani,et al.  Listening to two simultaneous speeches , 1999, Speech Commun..

[12]  SorouchyariF. Blind separation of sources, Part III , 1991 .

[13]  Frank Ehlers,et al.  Blind separation of convolutive mixtures and an application in automatic speech recognition in a noisy environment , 1997, IEEE Trans. Signal Process..

[14]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[15]  Shun-ichi Amari,et al.  Novel On-Line Adaptive Learning Algorithms for Blind Deconvolution Using the Natural Gradient Approach , 1997 .

[16]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[17]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[18]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[19]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[20]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[21]  David F. Rosenthal,et al.  Computational auditory scene analysis , 1998 .

[22]  Kiyotoshi Matsuoka,et al.  A neural net for blind separation of nonstationary signals , 1995, Neural Networks.

[23]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[24]  B. De Moor,et al.  ICA algorithms for 3 sources and 2 sensors , 1999, Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics. SPW-HOS '99.