RECOGNITION: A DYNAMIC RECURRENT NETWORK

Abstract This paper addresses a method of multichannel signal separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which uses the spatial independence of located sources as well as the temporal dependence of speech signals. Second, for practical implementation of the signal separation filter, we consider a dynamic recurrent network and develop a simple new learning algorithm. The performance of the proposed method is evaluated in terms of word recognition error rate (WER) in a large speech recognition experiment. The results show that our proposed method dramatically improves the word recognition performance in the case of two simultaneous speech inputs, and that a timing effect is involved in the segregation process.

[1]  Yunxin Zhao,et al.  Adaptive co-channel speech separation and recognition , 1999, IEEE Trans. Speech Audio Process..

[2]  Paris Smaragdis,et al.  Information theoretic approaches to source separation , 1997 .

[3]  Yannick Deville,et al.  Self-adaptive separation of convolutively mixed signals with a recursive structure. Part I: Stability analysis and optimization of asymptotic behaviour , 1999, Signal Process..

[4]  Colin Fyfe,et al.  A temporal model of linear anti-Hebbian learning , 1996, Neural Processing Letters.

[5]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[6]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[7]  Kari Torkkola,et al.  Blind separation of convolved sources based on information maximization , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[8]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[9]  R. Lambert Multichannel blind deconvolution: FIR matrix algebra and separation of multipath mixtures , 1996 .

[10]  Hervé Glotin,et al.  A CASA-labelling model using the localisation cue for robust cocktail-party speech recognition , 1999, EUROSPEECH.

[11]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[12]  최승진 Multichannel signal separation for cocktail party speech recognition: A dynamic recurrent network , 2000 .

[13]  Te-Won Lee,et al.  Blind Separation of Delayed and Convolved Sources , 1996, NIPS.

[14]  C. L. Nikias,et al.  Fast converging methods for multichannel blind equalization or separation of multipath mixtures , 1996, Proceedings of MILCOM '96 IEEE Military Communications Conference.

[15]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[16]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.

[17]  Wei Ren,et al.  Convergence analysis of the filtered-U algorithm for active noise control , 1999, Signal Process..

[18]  Hervé Glotin,et al.  Blind separation of delayed and superimposed acoustic sources : learning algorithms an experimental study , 1999 .

[19]  Hui Luo,et al.  Direct blind separation of independent non-Gaussian signals with dynamic channels , 1998, 1998 Fifth IEEE International Workshop on Cellular Neural Networks and their Applications. Proceedings (Cat. No.98TH8359).

[20]  Seungjin Choi,et al.  Adaptive Blind Separation of Speech Signals: Cocktail Party Problem , 1997 .