SLAM-based online calibration of asynchronous microphone array for robot audition

This paper addresses the online calibration of an asynchronous microphone array for robots. Conventional microphone array technologies require a lot of measurements of transfer functions to calibrate microphone locations, and a multi-channel A/D converter for inter-microphone synchronization. We solve these two problems using a framework combining Simultaneous Localization and Mapping (SLAM) and beamforming in an online manner. To do this, we assume that estimations of microphone locations, a sound source location, and microphone clock difference correspond to mapping, self-localization, observation errors in SLAM, respectively. In our framework, the SLAM process calibrates locations and clock differences of microphones every time a microphone array observes a sound like a human's clapping, and a beamforming process works as a cost function to decide the convergence of calibration by localizing the sound with the estimated locations and clock differences. After calibration, beamforming is used for sound source localization. We implemented a prototype system using Extended Kalman Filter (EKF) based SLAM and Delay-and-Sum Beamforming (DS-BF). The experimental results showed that microphone locations and clock differences were estimated properly with 10–15 sound events (handclaps), and the error of sound source localization with the estimated information was less than the grid size of beamforming, that is, the lowest error was theoretically attained.

[1]  Hiroshi G. Okuno,et al.  A robot referee for rock-paper-scissors sound games , 2008, 2008 IEEE International Conference on Robotics and Automation.

[2]  Kiyohiro Shikano,et al.  Two-stage blind source separation based on ICA and binary masking for real-time robot audition system , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Keisuke Nakamura,et al.  Intelligent sound source localization for dynamic environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Parham Aarabi,et al.  Robust sound localization using multi-source audiovisual information fusion , 2001, Inf. Fusion.

[5]  Hiroshi G. Okuno,et al.  Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers , 2010, Adv. Robotics.

[6]  Jean Rouat,et al.  Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[7]  James L. Flanagan,et al.  The huge microphone array , 1998, IEEE Concurr..

[8]  Jean Rouat,et al.  Enhanced robot audition based on microphone array source separation with post-filter , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[9]  Hiroshi G. Okuno,et al.  Real-Time Tracking of Multiple Sound Sources by Integration of In-Room and Robot-Embedded Microphone Arrays , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Kazuhiro Nakadai,et al.  Sound source tracking with directivity pattern estimation using a 64 ch microphone array , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Sebastian Thrun,et al.  Affine Structure From Sound , 2005, NIPS.

[12]  James Glass,et al.  A 1020-Node Modular Microphone Array and Beamformer for Intelligent Computing Spaces , 2004 .

[13]  Nobutaka Ito,et al.  Blind alignment of asynchronously recorded signals for distributed microphone array , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Hideki Asoh,et al.  Sound source localization and signal separation for office robot "JiJo-2" , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).