Real-time acoustic source localization in noisy environments for human-robot multimodal interaction

Interaction between humans involves a plethora of sensory information, both in the form of explicit communication as well as more subtle unconsciously perceived signals. In order to enable natural human-robot interaction, robots will have to acquire the skills to detect and meaningfully integrate information from multiple modalities. In this article, we focus on sound localization in the context of a multi-sensory humanoid robot that combines audio and video information to yield natural and intuitive responses to human behavior, such as directed eye-head movements towards natural stimuli. We highlight four common sound source localization algorithms and compare their performance and advantages for real-time interaction. We also briefly introduce an integrated distributed control framework called DVC, where additional modalities such as speech recognition, visual tracking, or object recognition can easily be integrated. We further describe the way the sound localization module has been integrated in our humanoid robot, CB.

[1]  Emiliano Macaluso,et al.  Emiliano Macaluso Multisensory Processing in Sensory-Specific Cortical Areas , 2006 .

[2]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[3]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[4]  Helge J. Ritter,et al.  Multi-modal human-machine communication for instructing robot grasping tasks , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Gerald M. Edelman,et al.  Robust localization of auditory and visual targets in a robotic barn owl , 2000, Robotics Auton. Syst..

[6]  Gordon Cheng,et al.  Foveated vision systems with two cameras per eye , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[7]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[8]  Lambert Schomaker,et al.  Audio visual and Multimodal Speech Systems , 2003 .

[9]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[10]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[11]  Carsten Schauer Modellierung primärer multisensorischer Mechanismen der räumlichen Wahrnehmung , 2006 .

[12]  Paul M. Fitzpatrick,et al.  Developmentally deep perceptual system for a humanoid robot , 2003 .

[13]  Tetsunori Kobayashi,et al.  Multi-person conversation via multi-modal interface - a robot who communicate with multi-user - , 1999, EUROSPEECH.

[14]  Ian R. Fasel,et al.  Face-to-face interactive humanoid robot , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[15]  Gordon Cheng,et al.  Distributed visual attention on a humanoid robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[16]  Gordon Cheng,et al.  Learning feature representations for an object recognition system , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[17]  R. Brooks,et al.  The cog project: building a humanoid robot , 1999 .

[18]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[19]  Hiroaki Kitano,et al.  Real-Time Active Human Tracking by Hierarchical Integration of Audition and Vision , 2001 .

[20]  Guy J. Brown,et al.  A model of auditory attention , 2000 .

[21]  Giulio Sandini,et al.  Development of auditory-evoked reflexes: Visuo-acoustic cues integration in a binocular head , 2002, Robotics Auton. Syst..

[22]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[23]  Sergios Theodoridis,et al.  A Novel Efficient Cluster-Based MLSE Equalizer for Satellite Communication Channels with-QAM Signaling , 2006, EURASIP J. Adv. Signal Process..

[24]  E I Knudsen,et al.  A neural map of auditory space in the owl. , 1978, Science.

[25]  S. Shimojo,et al.  Sensory modalities are not separate modalities: plasticity and interactions , 2001, Current Opinion in Neurobiology.

[26]  Peter Paschke,et al.  A Spike-Based Model of Binaural Sound Localization , 1999, Int. J. Neural Syst..

[27]  E. Macaluso Multisensory Processing in Sensory-Specific Cortical Areas , 2006, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[28]  Richard F. Lyon,et al.  On the importance of time—a temporal representation of sound , 1993 .

[29]  G. Cheng,et al.  Gaze shift reflex in a humanoid active vision system , 2007 .

[30]  Gordon Cheng,et al.  Continuous humanoid interaction: : An integrated perspective - gaining adaptivity, redundancy, flexibility - in one , 2001, Robotics Auton. Syst..

[31]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[32]  R. Bischoff,et al.  Integrating vision, touch and natural language in the control of a situation-oriented behavior-based humanoid robot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).