Improved sound source localization in horizontal plane for binaural robot audition

An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a binaural robot platform: 1) diffraction of sound waves with multipath interference caused by the contours of the robot head, which affects localization accuracy, and 2) front-back ambiguity, which limits the localization range to half the horizontal space. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using two dummy heads equipped with small or large pinnae showed that localization errors were reduced by 8.91° (3.21° vs. 12.12°) on average with the new time delay factor compared with the conventional GCC-PHAT method and that the success rate for front-back disambiguation using the pinnae amplification effect was 29.76 % (93.46 % vs. 72.02 %) better on average over the entire azimuth than with a conventional head related transfer function (HRTF)-based method.

[1]  Jonas Braasch,et al.  Binaural signal processing , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[2]  D. Hertz,et al.  Time delay estimation by generalized cross correlation methods , 1984 .

[3]  Sang Jin Cho,et al.  Front-back confusion resolution in three-dimensional sound localization using databases built with a dummy head. , 2007, The Journal of the Acoustical Society of America.

[4]  Chen-Yu Chan,et al.  Simultaneous localization of mobile robot and multiple sound sources using microphone array , 2009, 2009 IEEE International Conference on Robotics and Automation.

[5]  Daniel Starch Perimetry of the localization of sound , 2012 .

[6]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[7]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[8]  Hiroshi G. Okuno,et al.  Improved Sound Source Localization and Front-Back Disambiguation for Humanoid Robots with Two Ears , 2013, IEA/AIE.

[9]  Hong Liu,et al.  Sound source localization for mobile robot based on time difference feature and space grid matching , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  William A. Yost,et al.  Spatial hearing: The psychophysics of human sound localization, revised edition , 1998 .

[11]  Tobias Rodemann A study on distance estimation in binaural sound localization , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Hiroshi G. Okuno,et al.  Robust localization and tracking of multiple speakers in real environments for binaural robot audition , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[13]  Kerstin Dautenhahn,et al.  Socially intelligent robots: dimensions of human–robot interaction , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  D. M. Green,et al.  Sound localization by human listeners. , 1991, Annual review of psychology.

[15]  E. B. Newman,et al.  The precedence effect in sound localization. , 1949, The American journal of psychology.

[16]  H. Wallach,et al.  The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[17]  Hyogon Kim,et al.  Speaker localization using the TDOA-based feature matrix for a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[18]  Alex C. Kot,et al.  DOA estimation of speech source with microphone arrays , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[19]  Jean-Luc Zarader,et al.  Towards a systematic study of binaural cues , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Gregory H. Wakefield,et al.  Introduction to Head-Related Transfer Functions (HRTFs): Representations of HRTFs in Time, Frequency, and Space , 2001 .

[21]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[22]  Gökhan Ince,et al.  Using binaural and spectral cues for azimuth and elevation localization , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[24]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[25]  O. Kirkeby,et al.  Resolution of front-back confusion in virtual acoustic imaging systems. , 2000, The Journal of the Acoustical Society of America.

[26]  Satoshi Kagami,et al.  Spherical microphone array for spatial sound localization for a mobile robot , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Toshiharu Mukai,et al.  3D sound source localization system based on learning of binaural hearing , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[28]  Jean Rouat,et al.  Robust sound source localization using a microphone array on a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[29]  R. Boucher,et al.  Performance of the generalized cross correlator in the presence of a strong spectral peak in the signal , 1981 .

[30]  Te-Won Lee,et al.  Blind Source Separation Exploiting Higher-Order Frequency Dependencies , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[32]  Wonyong Sung,et al.  A voice activity detector employing soft decision based noise spectrum adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[33]  F. Asano,et al.  An optimum computer‐generated pulse signal suitable for the measurement of very long impulse responses , 1995 .

[34]  Hiroshi G. Okuno,et al.  Improved binaural sound localization and tracking for unknown time-varying number of speakers , 2013, Adv. Robotics.

[35]  J. Hassab,et al.  Optimum estimation of time delay by a generalized correlator , 1979 .

[36]  A. Bregman Auditory Scene Analysis , 2008 .

[37]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[38]  Hiroshi Mizoguchi,et al.  Three ring microphone array for 3D sound localization and separation for mobile robot audition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  G. C. Carter,et al.  The smoothed coherence transform , 1973 .