Online simultaneous localization and mapping of multiple sound sources and asynchronous microphone arrays

This paper presents an online method of simultaneous localization and mapping (SLAM) for estimating the positions of multiple moving sound sources and stationary robots and synchronizing microphone arrays attached to those robots. Since each robot with a microphone array can solely estimate the directions of sound sources, the two-dimensional source positions can be estimated from the source directions estimated by multiple robots using a triangulation method. In addition, sound mixtures can be separated accurately by regarding distributed microphone arrays as one big array. To perform these tasks, some methods have been proposed for localizing and synchronizing microphone arrays. These methods, however, can be used only if a single sound source exists because the time differences of arrival (TDOAs) between microphones are assumed to be directly observed. To overcome this limitation, we propose a unified state-space model that encodes the source and robot positions and the time offsets between microphone arrays in a latent space. Given the TDOAs and directions of arrival (DOAs) estimated by separating observed mixture sounds into source sounds, the latent variables are estimated jointly in an online manner using a FastSLAM2.0 algorithm that can deal with an unknown time-varying number of moving sound sources.

[1]  Hiroshi G. Okuno,et al.  Robot Audition: Missing Feature Theory Approach and Active Audition , 2009, ISRR.

[2]  Kazuhiro Nakadai,et al.  Blind Source Separation With Parameter-Free Adaptive Step-Size Method for Robot Audition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  H. Howard Fan,et al.  Asynchronous Differential TDOA for Sensor Self-Localization , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Eric Martinson,et al.  Optimizing a reconfigurable robotic microphone array , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Guobin Shen,et al.  BeepBeep: a high accuracy acoustic ranging system using COTS mobile devices , 2007, SenSys '07.

[6]  Hiroshi G. Okuno,et al.  Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers , 2010, Adv. Robotics.

[7]  Emmanuel Vincent,et al.  Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation , 2010 .

[8]  Sebastian Thrun,et al.  FastSLAM: An Efficient Solution to the Simultaneous Localization And Mapping Problem with Unknown Data , 2004 .

[9]  Mikael Mieskolainen,et al.  Closed-form self-localization of asynchronous microphone arrays , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[10]  Erik Berglund,et al.  Sound source localisation through active audition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Katsutoshi Itoyama,et al.  Optimizing the layout of multiple mobile robots for cooperative sound source separation , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Hiroaki Kitano,et al.  Active Audition for Humanoid , 2000, AAAI/IAAI.

[13]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.

[14]  Satoshi Sato,et al.  Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System , 2014, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[15]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[16]  Gernot A. Fink,et al.  Towards acoustic self-localization of ad hoc smartphone arrays , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[17]  Evangelos Milios,et al.  Active stereo sound localization. , 2003, The Journal of the Acoustical Society of America.

[18]  Petr Tichavsk,et al.  Latent Variable Analysis and Signal Separation , 2012, Lecture Notes in Computer Science.

[19]  Sophie Rosset,et al.  Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice , 2013 .

[20]  Teresa A. Vidal-Calleja,et al.  Simultaneous asynchronous microphone array calibration and sound source localisation , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[22]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[23]  Keisuke Nakamura,et al.  SLAM-based online calibration of asynchronous microphone array for robot audition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.