Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array

A barge-in free spoken dialogue interface using sound field control and microphone array is proposed. In the conventional spoken dialogue system using an acoustic echo canceller, it is indispensable to estimate a room transfer function, especially when the transfer function is changed by various interferences. However, the estimation is difficult when the user and the system speak simultaneously. To resolve the problem, we propose a sound field control technique to prevent the response sound from being observed. Combined with a microphone array, the proposed method can achieve high elimination performance with no adaptive process. The efficacy of the proposed interface is ascertained in the experiments on the basis of sound elimination and speech recognition.

[1]  Kiyohiro Shikano,et al.  Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments , 2003, INTERSPEECH.

[2]  Walter Kellermann,et al.  Wave-domain adaptive filtering: acoustic echo cancellation for full-duplex systems based on wave-field synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  K. Shikano,et al.  Sound Reproduction System Including Adaptive Compensation of Temperature Fluctuation Effect for Broad-Band Sound Control , 2002, IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences.

[4]  Takashi Araseki,et al.  Echo Canceler with Two Echo Path Models , 1977, IEEE Trans. Commun..

[5]  Young-Cheol Park,et al.  A new adaptive algorithm for stereophonic acoustic echo canceller , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[7]  Kiyohiro Shikano,et al.  Double-Talk Free Spoken Dialogue Interface Combining Sound Field Control With Semi-Blind Source Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Jerry Bauck,et al.  Generalized transaural stereo and applications , 1996 .

[9]  Eberhard Hänsler Acoustic echo and noise control: where do we come from - where do we go to? , 2001 .

[10]  Shuichi Itahashi,et al.  Design and Creation of Speech and Text Corpora of Dialogue (Special Issue on Speech and Discourse Processing in Dialogue Systems) , 1993 .

[11]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[12]  Shuichi Itahashi,et al.  The design of the newspaper-based Japanese large vocabulary continuous speech recognition corpus , 1998, ICSLP.

[13]  Shuichi Itahashi,et al.  JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research , 1999 .

[14]  Biing-Hwang Juang,et al.  Hands-free telecommunications , 2001 .

[15]  Shoji Makino Stereophonic acoustic echo cancellation: An overview and recent solutions , 2001 .

[16]  Wang Sheng Li,et al.  Estimating the parameters of moving targets in the SAR , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[17]  Walter Kellermann,et al.  GSAEC — Acoustic echo cancellation embedded into the generalized sidelobe canceller , 2000, 2000 10th European Signal Processing Conference.

[18]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[21]  Kiyohiro Shikano,et al.  A new phonetic tied-mixture model for efficient decoding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[22]  F. Asano,et al.  An optimum computer‐generated pulse signal suitable for the measurement of very long impulse responses , 1995 .

[23]  Kiyohiro Shikano,et al.  An Iterative Inverse Filter Design Method for the Multichannel Sound Field Reproduction System , 2001 .

[24]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .