Automatic estimation of reverberation time with robot speech to improve ICA-based robot audition

This paper presents an ICA-based robot audition system which estimates the reverberation time of the environment automatically by using the robot's own speech. The system is based on multi-channel semi-blind independent component analysis (MCSB-ICA), a source separation method using a microphone array that can separate user and robot speech under reverberant environments. Perception of the reverberation time (RT) is critical, because an inappropriate RT degrades separation performance and increases processing time. Unlike most previous methods that assume the RT is given in advance, our method estimates an RT by using the echo's intensity of the robot's own speech. It has three steps: speaks a sentence in a new environment, calculates the relative powers of the echoes, and estimates the RT using linear regression of them. Experimental results show that this method sets an appropriate RT for MCSB-ICA for real-world environments and that word correctness is improved by up to 6 points and processing time is reduced by up to 60%.

[1]  Hideaki Sakai,et al.  A New Adaptive Filter Algorithm for System Identification using Independent Component Analysis , 2007, ICASSP.

[2]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.

[3]  Tetsuya Ogata,et al.  Enabling a user to specify an item at any time during system enumeration - item identification for barge-in-able conversational dialogue systems , 2009, INTERSPEECH.

[4]  Hiroshi G. Okuno,et al.  An open source software system for robot audition HARK and its evaluation , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[5]  Kiyohiro Shikano,et al.  Two-stage blind source separation based on ICA and binary masking for real-time robot audition system , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Shoko Araki,et al.  The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech , 2003, IEEE Trans. Speech Audio Process..

[7]  Tomohiro Nakatani,et al.  An integrated method for blind separation and dereverberation of convolutive audio mixtures , 2008, 2008 16th European Signal Processing Conference.

[8]  Tatsuya Kawahara,et al.  Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems , 2008, INTERSPEECH.

[9]  F. Asano,et al.  An optimum computer‐generated pulse signal suitable for the measurement of very long impulse responses , 1995 .

[10]  Biing-Hwang Juang,et al.  Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Tetsuya Ogata,et al.  Step-size parameter adaptation of multi-channel semi-blind ICA with piecewise linear model for barge-in-able robot audition , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Tetsuya Ogata,et al.  ICA-based efficient blind dereverberation and echo cancellation method for barge-in-able robot audition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Shun-ichi AMARIyy,et al.  NATURAL GRADIENT LEARNING WITH A NONHOLONOMIC CONSTRAINT FOR BLIND DECONVOLUTION OF MULTIPLE CHANNELS , 1999 .

[14]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[15]  Kiyohiro Shikano,et al.  Barge-in- and noise-free spoken dialogue interface based on sound field control and semi-blind source separation , 2007, 2007 15th European Signal Processing Conference.