Human-robot interface using robust speech recognition and user localization based on noise separation device

This paper introduces a robust human-robot interface (HRI) system using a speech recognition and a user localization. For a robust speech recognition indoors under unknown noises and acoustic reverberations, a blind source separation (BSS) algorithm is implemented by a block-wise processing and developed using digital signal processing board to guarantee real-time operation. And a reverberation-robust sound source localization algorithm using separated signals is proposed. Although the BSS method cannot completely preserve the room acoustic information, the proposed localization algorithm overcomes this problem using target channel selection and target-emphasized enhancement methods. The developed algorithms are integrated into the commercial robot system to provide overall voice-enabled HRI. A series of tests are conducted to evaluate the performance of the BSS-based speech recognition and user localization method, and the results show a remarkable performance even under severe non-stationary noise conditions.

[1]  Shun-ichi Amari,et al.  Novel On-Line Adaptive Learning Algorithms for Blind Deconvolution Using the Natural Gradient Approach , 1997 .

[2]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[3]  Yongbeom Lee,et al.  Real-time audio-visual localization of user using microphone array and vision camera , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  E. Oja,et al.  Independent Component Analysis , 2001 .

[5]  Walter Kellermann,et al.  A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Te-Won Lee,et al.  Fast fixed-point independent vector analysis algorithms for convolutive blind source separation , 2007, Signal Process..

[7]  S. Nam Frequency-Domain Normalized Multichannel Blind Deconvolution for Convolutive Speech Mixtures: Modifications and Properties , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[8]  Jean-Claude Junqua,et al.  A robust algorithm for word boundary detection in the presence of noise , 1994, IEEE Trans. Speech Audio Process..

[9]  Walter Kellermann,et al.  A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments , 2006, Signal Process..

[10]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[11]  Hong Kim,et al.  A name recognition based call-and-come service for home robots , 2008, IEEE Transactions on Consumer Electronics.

[12]  Lucas C. Parra,et al.  On-line Convolutive Blind Source Separation of Non-Stationary Signals , 2000, J. VLSI Signal Process..

[13]  Kari Torkkola,et al.  Blind deconvolution, information maximization and recursive filters , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.