Speaker localization using the TDOA-based feature matrix for a humanoid robot

Research on human-robot interaction has recently been getting increasing attention. In the research field of human-robot interaction, speech signal processing in particular is the source of much interest. In this paper, we report on a speaker localization system with six microphones for a humanoid robot called MAHRU of KIST and propose a time delay of arrival (TDOA)-based feature matrix with its algorithm based on the minimum sum of absolute errors (MSAE) for sound source localization. The TDOA-based feature matrix is defined as a simple database matrix calculated from pairs of microphones installed on a humanoid robot. To verify the solid performance of our speaker localization system for a humanoid robot, we present the various experimental results for the speech sources at all directions within 5 m distance and the height divided into three parts.

[1]  Tatsuzo Ishida,et al.  Development of sensor system of a small biped entertainment robot , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[2]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Kamen Y. Guentchev,et al.  Learning-Based Three Dimensional Sound Localization Using a Compact Non-Coplanar Array of Microphones , 1998 .

[4]  Tetsunori Kobayashi,et al.  Multi-person conversation via multi-modal interface - a robot who communicate with multi-user - , 1999, EUROSPEECH.

[5]  Jie Huang,et al.  Robotic spatial sound localization and its 3D sound human interface , 2002, First International Symposium on Cyber Worlds, 2002. Proceedings..

[6]  Fumio Kanehiro,et al.  Robust speech interface based on audio and video information fusion for humanoid HRP-2 , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[7]  Jean Rouat,et al.  Robust sound source localization using a microphone array on a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[8]  Satoshi Nakamura,et al.  Localization of multiple sound sources based on a CSP analysis with a microphone array , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[10]  Hiroaki Kitano,et al.  Real-time sound source localization and separation for robot audition , 2002, INTERSPEECH.

[11]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[12]  Hiroaki Kitano,et al.  Social Interaction of Humanoid RobotBased on Audio-Visual Tracking , 2002, IEA/AIE.

[13]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[14]  Tetsuya Ogata,et al.  Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.