Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition

This paper presents an improved speaker localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for binaural robot audition. The problem with the conventional direction-of-arrival (DOA) estimation based on the GCC-PHAT method is a multipath interference whereby a sound wave travels to microphones via the front-head path and the back-head path in binaural robot audition. This paper describes a new time delay factor for the GCC-PHAT method to compensate multipath interference on the assumption of spherical robot head. In addition, the restriction of the time difference of arrival (TDOA) estimation by the sampling frequency is also solved by applying the maximum likelihood (ML) estimation in frequency domain. Experiments conducted in the SIG-2 humanoid robot show that the proposed method reduces localization errors by 17.8 degrees on average and by over 35 degrees in side directions comparing to the conventional DOA estimation.

[1]  Gregory H. Wakefield,et al.  Introduction to Head-Related Transfer Functions (HRTFs): Representations of HRTFs in Time, Frequency, and Space , 2001 .

[2]  Erik Berglund,et al.  Sound source localisation through active audition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  William A. Yost,et al.  Spatial hearing: The psychophysics of human sound localization, revised edition , 1998 .

[4]  Gordon Cheng,et al.  Real-time acoustic source localization in noisy environments for human-robot multimodal interaction , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[5]  Hiroaki Kitano,et al.  Robot recognizes three simultaneous speech by active audition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[6]  Hyogon Kim,et al.  Speaker localization using the TDOA-based feature matrix for a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[7]  Hyun-Don Kim Binaural Active Audition for Humanoid Robots , 2008 .

[8]  Piergiorgio Svaizer,et al.  Efficient Time Delay Estimation based on Cross-Power Spectrum Phase , 2006, 2006 14th European Signal Processing Conference.

[9]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[10]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[11]  Hiroshi G. Okuno,et al.  Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots , 2004, Speech Commun..

[12]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[13]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..