论文信息 - Sound source separation of moving speakers for robot audition

Sound source separation of moving speakers for robot audition

This paper addresses sound source separation and speech recognition for moving sound sources. Real-world applications such as robots should cope with both moving and stationary sound sources. However, most studies assume only stationary sound sources. We introduce two key techniques to cope with moving sources, that is, Adaptive Step-size control (AS) and Optima Controlled Recursive Average (OCRA) to improve blind source separation. We implemented a real-time robot audition system with these techniques for our humanoid robot ASIMO with an 8ch microphone array by using HARK which is our open-source software for robot audition. The performance of the system will be shown through sound source separation for moving sources and automatic speech recognition of separated speeches.

Kazuhiro Nakadai | Yuji Hasegawa | Hiroshi Tsujino | Hirofumi Nakajima

[1] Hiroaki Kitano,et al. Active Audition for Humanoid , 2000, AAAI/IAAI.

[2] Nobuaki Minematsu,et al. Free software toolkit for Japanese large vocabulary continuous speech recognition , 2000, INTERSPEECH.

[3] Tetsuya Ogata,et al. Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[4] B. A. D. H. Brandwood. A complex gradient operator and its applica-tion in adaptive array theory , 1983 .

[5] Hiroshi G. Okuno,et al. An open source software system for robot audition HARK and its evaluation , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[6] Kazuhiro Nakadai,et al. Adaptive step-size parameter control for real-world blind source separation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Hiroshi G. Okuno,et al. Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots , 2004, Speech Commun..

[8] Fumio Kanehiro,et al. Robust speech interface based on audio and video information fusion for humanoid HRP-2 , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[9] Seiichi Yamamoto,et al. An adaptive echo canceller with variable step gain method , 1982 .

[10] Jean Rouat,et al. Enhanced robot audition based on microphone array source separation with post-filter , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[11] François Michaud,et al. Code reusability tools for programming mobile robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).