Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering

This paper deals with the problem of localizing and tracking a moving speaker over the full range around the mobile robot. The problem is solved by taking advantage of the phase shift between signals received at spatially separated microphones. The proposed algorithm is based on estimating the time difference of arrival by maximizing the weighted cross-correlation function in order to determine the azimuth angle of the detected speaker. The cross-correlation is enhanced with an adaptive signal-to-noise estimation algorithm to make the azimuth estimation more robust in noisy surroundings. A post-processing technique is proposed in which each of these microphone-pair determined azimuths are further combined into a mixture of von Mises distributions, thus producing a practical probabilistic representation of the microphone array measurement. It is shown that this distribution is inherently multimodal and that the system at hand is non-linear. Therefore, particle filtering is applied for discrete representation of the distribution function. Furthermore, the two most common microphone array geometries are analysed and exhaustive experiments were conducted in order to qualitatively and quantitatively test the algorithm and compare the two geometries. Also, a voice activity detection algorithm based on the before-mentioned signal-to-noise estimator was implemented and incorporated into the existing speaker localization system. The results show that the algorithm can reliably and accurately localize and track a moving speaker.

[1]  Gordon Cheng,et al.  Real-time acoustic source localization in noisy environments for human-robot multimodal interaction , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Yariv Ephraim,et al.  Recent Advancements in Speech Enhancement , 2004 .

[3]  Jie Huang,et al.  A model-based sound localization system and its application to robot navigation , 1999, Robotics Auton. Syst..

[4]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[5]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[6]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[7]  Michael S. Brandstein,et al.  A closed-form location estimator for use with room environment microphone arrays , 1997, IEEE Trans. Speech Audio Process..

[8]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[9]  P. Sprent,et al.  Statistical Analysis of Circular Data. , 1994 .

[10]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[11]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Richard C. Dorf,et al.  Circuits, Signals, and Speech and Image Processing , 2006 .

[13]  Dieter Fox,et al.  Adapting the Sample Size in Particle Filters Through KLD-Sampling , 2003, Int. J. Robotics Res..

[14]  Jorge Dias,et al.  Implementation and calibration of a Bayesian binaural system for 3D localisation , 2009, 2008 IEEE International Conference on Robotics and Biomimetics.

[15]  Eric A. Lehmann,et al.  Particle Filter with Integrated Voice Activity Detection for Acoustic Source Tracking , 2007, EURASIP J. Adv. Signal Process..

[16]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[17]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[18]  Jie Huang,et al.  Sound localization in reverberant environment based on the model of the precedence effect , 1997 .

[19]  Youngjin Park,et al.  Sound Source Localization Methods with Considering of Microphone Placement in Robot Platform , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[20]  Durga Misra,et al.  A Synthesizable VHDL Model of the Exact Solution for Three-dimensional Hyperbolic Positioning System , 2002, VLSI Design.

[21]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[22]  K. C. Ho,et al.  A simple and efficient estimator for hyperbolic location , 1994, IEEE Trans. Signal Process..

[23]  S. Wermter,et al.  A recurrent neural network for sound-source motion tracking and prediction , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[24]  Jie Huang,et al.  Echo avoidance in a computational model of the precedence effect , 1999, Speech Commun..

[25]  Ahmad Hashemi-Sakhtsari,et al.  Target tracking by time difference of arrival using recursive smoothing , 2005, Signal Process..

[26]  Kiyohiro Shikano,et al.  Talker Tracking Display On Autonomous Mobile Robot With A Moving Microphone Array , 2002 .

[27]  François Michaud,et al.  Evaluating real-time audio localization algorithms for artificial audition in robotics , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[29]  Hiroshi Mizoguchi,et al.  Sound Localization and Separation for Mobile Robot Tele-Operation by Tri-Concentric Microphone Array , 2007, J. Robotics Mechatronics.

[30]  Parham Aarabi,et al.  Enhanced sound localization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  Jean Rouat,et al.  Robust sound source localization using a microphone array on a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[32]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[33]  Jean Rouat,et al.  Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering , 2007, Robotics Auton. Syst..

[34]  Ivan Petrović,et al.  Speaker Localization and Tracking in Mobile Robot Environment Using a Microphone Array ? , 2010 .

[35]  Henry G. Dietz,et al.  Performance of phase transform for detecting sound sources with microphone arrays in reverberant and noisy environments , 2007, Signal Process..

[36]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[37]  Raffaele Parisi,et al.  Multi-Source Localization Strategies , 2001, Microphone Arrays.

[38]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[39]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[40]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[41]  Hiroaki Kitano,et al.  Real-time multiple speaker tracking by multi-modal integration for mobile robots , 2001, INTERSPEECH.

[42]  Juha Merimaa,et al.  Analysis, synthesis, and perception of spatial sound : binaural localization modeling and multichannel loudspeaker reproduction , 2006 .