Real-time implementation of blind spatial subtraction array for hands-free robot spoken dialogue system

In this paper, we construct a hands-free robot spoken dialogue system based on the real-time blind spatial subtraction array (BSSA) and evaluate the system. BSSA is the blind source extraction method, and the source extraction in BSSA is carried out by subtracting the power spectrum of the estimated noise signal by the independent component analysis from the power spectrum of the target speech partly enhanced signal. Although BSSA can reduce noise signal efficiently, ICA consumes huge amount of computational costs. Thus it is difficult to run BSSA in real-time. In this paper, we newly propose a real-time architecture of BSSA and construct a hands-free robot spoken dialogue system based on the real-time BSSA. In the hands-free robot spoken dialogue system with the real-time BSSA, 6% improvement of the speech recognition result can be seen compared with the conventional speech enhancement methods.

[1]  Kiyohiro Shikano,et al.  Robots that can hear, understand and talk , 2004, Adv. Robotics.

[2]  K. Komatani,et al.  Robot Audition from the Viewpoint of Computational Auditory Scene Analysis , 2008, International Conference on Informatics Education and Research for Knowledge-Circulating Society (icks 2008).

[3]  Shun'ichi Yamamoto,et al.  Computing for Computational Auditory Scene Analysis( Intelligent Computing and Related Issues(5)) , 2007 .

[4]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[5]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[6]  Shoko Araki,et al.  The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech , 2003, IEEE Trans. Speech Audio Process..

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[8]  Kiyohiro Shikano,et al.  Blind Source Separation Combining Independent Component Analysis and Beamforming , 2003, EURASIP J. Adv. Signal Process..

[9]  Kiyohiro Shikano,et al.  Development and Operational Result of Real Environment Speech-oriented Guidance Systems Kita-robo and Kita-Chan , 2007 .

[10]  Hiroaki Kitano,et al.  Applying scattering theory to robot audition system: robust sound source localization and extraction , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[11]  Hiroshi Hashimoto,et al.  Blind Separation of Acoustic Signals Combining SIMO-Model-Based Independent Component Analysis and Binary Masking , 2006, EURASIP J. Adv. Signal Process..

[12]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[13]  Kiyohiro Shikano,et al.  BLIND SPATIAL SUBTRACTION ARRAY WITH INDEPENDENT COMPONENT ANALYSIS FOR HANDS-FREE SPEECH RECOGNITION , 2006 .

[14]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.