Azimuthal source localization using interaural coherence in a robotic dog: modeling and application

In nature, sounds from multiple sources, as well as reflections from the surfaces of the physical surroundings, arrive concurrently from different directions at the ears of a listener. Despite the fact that all of these waveforms sum at the eardrums, humans with normal hearing can effortlessly segregate interesting sounds from echoes and other sources of background noises. This paper presents a two-microphone technique for localization of sound sources to effectively guide robotic navigation. Its fundamental structure is adopted from a binaural signal-processing scheme employed in biological systems for the localization of sources using interaural time differences (ITDs). The two input signals are analyzed for coincidences along left/right-channel delay-line pairs. The coincidence time instants are presented as a function of the interaural coherence (IC). Specifically, we build a sphere head model for the selected robot and apply the mechanism of binaural cues selection observed in mammalian hearing system to mitigate the effects of sound echoes. The sound source is found by determining the azimuth at which the maximum of probability density function (PDF) of ITD cues occurs. This eliminates the localization artifacts found during tests. The experimental results of a systematic evaluation demonstrate the superior performance of the proposed method.

[1]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[2]  Darren B. Ward,et al.  Particle filter beamforming for acoustic source localization in a reverberant environment , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Guy J. Brown,et al.  Speech segregation based on sound localization , 2003 .

[4]  Erik Berglund,et al.  Sound source localisation through active audition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Sean B. Andersson,et al.  A biomimetic apparatus for sound-source localization , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[6]  Steven George Goodridge Multimedia sensor fusion for intelligent camera control and human-computer interaction , 1997 .

[7]  Hiroaki Kitano,et al.  Robot recognizes three simultaneous speech by active audition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[8]  Chrystopher L. Nehaniv Computation for Metaphors, Analogy, and Agents , 2000, Lecture Notes in Computer Science.

[9]  Yoram Singer,et al.  Discriminative Binaural Sound Localization , 2002, NIPS.

[10]  Clement T. Yu,et al.  Enabling Society with Information Technology , 2002, Springer Japan.

[11]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[12]  Hiroaki Kitano,et al.  Social Interaction of Humanoid RobotBased on Audio-Visual Tracking , 2002, IEA/AIE.

[13]  R. Brooks,et al.  The cog project: building a humanoid robot , 1999 .

[14]  Jezekiel Ben-Arie,et al.  Conveying visual information with spatial auditory patterns , 1996, IEEE Trans. Speech Audio Process..

[15]  Hiroshi G. Okuno,et al.  Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots , 2004, Speech Commun..

[16]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[17]  Jezekiel Ben-Arie,et al.  Estimating the azimuth of a sound source from the binaural spectral amplitude , 1996, IEEE Trans. Speech Audio Process..

[18]  Jie Huang,et al.  A model-based sound localization system and its application to robot navigation , 1999, Robotics Auton. Syst..

[19]  B C Wheeler,et al.  Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.

[20]  M. Cynader,et al.  A computational theory of spectral cue localization , 1993 .

[21]  H.G. Okuno,et al.  Computational Auditory Scene Analysis and Its Application to Robot Audition: Five Years Experience , 2007, Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure (ICKS'07).

[22]  Stanley T. Birchfield,et al.  Acoustic localization by interaural level difference , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Hiroaki Kitano,et al.  Epipolar geometry based sound localization and extraction for humanoid audition , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).