Blind signal separation for recognizing overlapped speech.

Utilization of blind signal separation method for the enhancement of speech recognition accuracy under multi-speaker is discussed. The separation method is based upon the ICA (Independent Component Analysis) and has very few assumptions on the mixing process of speech signals. The recognition experiments are performed under various conditions concerned with (1) acoustic environment, (2) interfering speakers and (3) recognition systems. The obtained results under those conditions are summarized as follows.(1) The separation method can improve recognition accuracy more than 20% when the SNR of interfering signal is 0 to6dB, in a soundproof room. In a reverberant room, however, the improvement of the performance is degraded to about 10%.(2) In general, the recognition accuracy deteriorated more as the number of interfering speakers increased and the more improvement by the separation is obtained.(3) There is no significant difference between DTW isolated word discrimination and HMM continuous speech recognition results except for the fact that saturation in improvement is observed in high SNR condition in HMM CSR.

[1]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[2]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[3]  Fumitada Itakura,et al.  Distance measure for speech recognition based on the smoothed group delay spectrum , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[5]  Ea-Ee Jan,et al.  Spatially selective sound capture for speech and audio processing , 1993, Speech Commun..

[6]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[7]  Kunio Kashino,et al.  Organization of Hierarchical Perceptual Sounds: Music Scene Analysis with Autonomous Processing Modules and a Quantitative Information Integration Mechanism , 1995, IJCAI.

[8]  Tomohiro Nakatani,et al.  A new speech enhancement: speech stream segregation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  F. Itakura,et al.  On the problems in applying Bell’s blind separation to real environments , 1996 .

[10]  Kazuya Takeda,et al.  Extracting speech features from human speech like noise , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Satoshi Nakamura,et al.  Room acoustics and reverberation: impact on hands-free recognition , 1997, EUROSPEECH.