Binaural Systems in Robotics

Audition is often described by physiologists as the most important sense in humans, due to its essential role in communication and socialization. But quite surprisingly, the interest of this modality for robotics arose only in the 2000s, brought to evidence by cognitive robotics and Human–robot interaction. Since then, numerous contributions have been proposed to the field of robot audition, ranging from sound localization to scene analysis. Binaural approaches were investigated first, then became forsaken due to mixed results. Nevertheless, the last years have witnessed a renewal of interest in binaural active audition, that is, in the opportunities and challenges opened by the coupling of binaural sensing and robot motion. This chapter proposes a comprehensive state of the art of binaural approaches to robot audition. Though the literature on binaural audition and, more generally, on acoustics and signal processing, is a fundamental source of knowledge, the tasks, constraints, and environments of robotics raise original issues. These are reviewed, prior to the most prominent contributions, platforms and projects. Two lines of research in binaural active audition, conducted by the current authors, are then outlined, one of which is tightly connected to psychology of perception.

[1]  Tetsuya Ogata,et al.  Design and Implementation of 3D Auditory Scene Visualizer towards Auditory Awareness with Face Tracking , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[2]  Hideharu Amano,et al.  Implementation of active direction-pass filter on dynamically reconfigurable processor , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Hiroaki Kitano,et al.  Active Audition for Humanoid , 2000, AAAI/IAAI.

[4]  Keiichiro Hoashi,et al.  Humanoid robot-development of an information assistant robot Hadaly , 1997, Proceedings 6th IEEE International Workshop on Robot and Human Communication. RO-MAN'97 SENDAI.

[5]  Zefeng Wang,et al.  Approaches for Automatic Speaker Recognition in a Binaural Humanoid Context , 2011, ESANN.

[6]  Toshiharu Mukai,et al.  3D sound source localization system based on learning of binaural hearing , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[7]  Kazuhiro Nakadai,et al.  Real-time sound source orientation estimation using a 96 channel microphone array , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[9]  Akinori Ito,et al.  Internal noise suppression for speech recognition by small robots , 2005, INTERSPEECH.

[10]  Jean-Luc Zarader,et al.  From monaural to binaural speaker recognition for humanoid robots , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[11]  Yoshitaka Noda,et al.  Active soft pinnae for robots , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Jun-ichi Imura,et al.  Ego noise suppression of a robot using template subtraction , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Jun-ichi Imura,et al.  Multi-talker speech recognition under ego-motion noise using Missing Feature Theory , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Hoirin Kim,et al.  Reliable Speaker Identification Using Multiple Microphones in Ubiquitous Robot Companion Environment , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[15]  A. Noë,et al.  A sensorimotor account of vision and visual consciousness. , 2001, The Behavioral and brain sciences.

[16]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[17]  Jean Rouat,et al.  Enhanced robot audition based on microphone array source separation with post-filter , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[18]  Gordon Cheng,et al.  Development of an integrated multi-modal communication robotic face , 2012, 2012 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO).

[19]  Marc Moonen,et al.  Binaural voice activity detection for MWF-based noise reduction in binaural hearing aids , 2011, 2011 19th European Signal Processing Conference.

[20]  J. Kevin O'Regan,et al.  Is There Something Out There? Inferring Space from Sensorimotor Dependencies , 2003, Neural Computation.

[21]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[22]  Daniel P. W. Ellis,et al.  Combining localization cues and source model constraints for binaural source separation , 2011, Speech Commun..

[23]  Alain de Cheveigné,et al.  Sensorimotor learning of sound localization from an auditory evoked behavior , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24]  Jean-Arcady Meyer,et al.  Phonotaxis behavior in the artificial rat Psikharpax , 2010 .

[25]  J. Kevin O'Regan,et al.  How to Build a Robot that is Conscious and Feels , 2012, Minds and Machines.

[26]  Hiroaki Kitano,et al.  Applying scattering theory to robot audition system: robust sound source localization and extraction , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[27]  Patrick Danès,et al.  A versatile System-on-a-Programmable-Chip for array processing and binaural robot audition , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Yong Rui,et al.  Real-time speaker tracking using particle filter sensor fusion , 2004, Proceedings of the IEEE.

[29]  Tetsuya Ogata,et al.  Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Martin Cooke,et al.  Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners , 2011, Speech Commun..

[31]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[32]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[33]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.

[34]  José Santos-Victor,et al.  Sound Localization for Humanoid Robots - Building Audio-Motor Maps based on the HRTF , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  W. Owen Brimijoin,et al.  Undirected head movements of listeners with asymmetrical hearing impairment during a speech-in-noise task , 2012, Hearing Research.

[36]  K. Tanaka,et al.  A novel mechanical cochlea "Fishbone" with dual sensor/actuator characteristics , 1998 .

[37]  Tetsuya Ogata,et al.  Missing-Feature based Speech Recognition for Two Simultaneous Speech Signals Separated by ICA with a pair of Humanoid Ears , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  Patrick Danès,et al.  Optimal positioning of a binaural sensor on a humanoid head for sound source localization , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[39]  R. Bajcsy Active perception , 1988 .

[40]  A. Weiss,et al.  Fundamental limitations in passive time delay estimation--Part I: Narrow-band systems , 1983 .

[41]  C. Knapp,et al.  Time delay estimation in the presence of relative motion , 1977 .

[42]  Hiroaki Kitano,et al.  Active audition system and humanoid exterior design , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[43]  Tetsunori Kobayashi,et al.  Multi-person conversation via multi-modal interface - a robot who communicate with multi-user - , 1999, EUROSPEECH.

[44]  Patrick Danès,et al.  Information-theoretic detection of broadband sources in a coherent beamspace MUSIC scheme , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[46]  S. Schulz,et al.  Humanoid separation of speech sources in reverberant environments , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[47]  Dorothea Kolossa,et al.  Further Challenges and the Road Ahead , 2013 .

[48]  Mitsuru Ishizuka,et al.  Speech recognition for a robot under its motor noises by selective application of missing feature theory and MLLR , 2006, SAPA@INTERSPEECH.

[49]  Richard Lippmann,et al.  Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise KN-37 , 1997, EUROSPEECH.

[50]  Ashutosh Saxena,et al.  Learning sound location from a single microphone , 2009, 2009 IEEE International Conference on Robotics and Automation.

[51]  Jon Barker,et al.  Using location cues to track speaker changes from mobile, binaural microphones , 2009, INTERSPEECH.

[52]  Phil D. Green,et al.  Missing data techniques for robust speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  Sean B. Andersson,et al.  A biomimetic apparatus for sound-source localization , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[54]  Jun-ichi Imura,et al.  Incremental learning for ego noise estimation of a robot , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[55]  Alois Knoll,et al.  Design Principles for Safety in Human-Robot Interaction , 2010, Int. J. Soc. Robotics.

[56]  Radu Horaud,et al.  The cocktail party robot: Sound source separation and localisation with an active binaural head , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[57]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[58]  Alban Portello,et al.  Active binaural localization of intermittent moving sources in the presence of false measurements , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[59]  Jean-Luc Zarader,et al.  Towards a systematic study of binaural cues , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Hiroshi G. Okuno,et al.  Two-layered audio-visual speech recognition for robots in noisy environments , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[61]  H.G. Okuno,et al.  Computational Auditory Scene Analysis and its Application to Robot Audition , 2004, 2008 Hands-Free Speech Communication and Microphone Arrays.

[62]  Richard M. Stern,et al.  Inference of missing spectrographic features for robust speech recognition , 1998, ICSLP.

[63]  Jon Barker,et al.  The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements , 2008, ICMI '08.

[64]  Jean-Luc Zarader,et al.  A binaural sound source localization method using auditive cues and vision , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[65]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[66]  Ea-Ee Jan,et al.  Microphone arrays and speaker identification , 1994, IEEE Trans. Speech Audio Process..

[67]  Hiroaki Kitano,et al.  Auditory fovea based speech separation and its application to dialog system , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[68]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[69]  François Michaud,et al.  Spartacus attending the 2005 AAAI conference , 2007, Auton. Robots.

[70]  Tobias Rodemann A study on distance estimation in binaural sound localization , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[71]  Kiyohiro Shikano,et al.  Enhancement of speech signals separated from their convolutive mixture by FDICA algorithm , 2009, Digit. Signal Process..

[72]  Jie Huang,et al.  A model-based sound localization system and its application to robot navigation , 1999, Robotics Auton. Syst..

[73]  Radu Horaud,et al.  Learning the Direction of a Sound Source Using Head Motions and Spectral Features , 2011 .

[74]  Harry L. Van Trees,et al.  Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .

[75]  Makoto Kumon,et al.  Audio servo for robotic systems with pinnae , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[76]  Hiroaki Kitano,et al.  Epipolar geometry based sound localization and extraction for humanoid audition , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[77]  Richard M. Stern,et al.  A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[78]  Christian Jutten,et al.  A study of lip movements during spontaneous dialog and its application to voice activity detection. , 2009, The Journal of the Acoustical Society of America.

[79]  Hiroaki Kitano,et al.  Robot recognizes three simultaneous speech by active audition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[80]  Radu Horaud,et al.  Active hearing, active speaking , 2007 .

[81]  Patrick Danès,et al.  Broadband variations of the MUSIC high-resolution method for Sound Source Localization in Robotics , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[82]  Eric Martinson,et al.  Dynamically reconfigurable microphone arrays , 2011, 2011 IEEE International Conference on Robotics and Automation.

[83]  R. Brooks,et al.  The cog project: building a humanoid robot , 1999 .

[84]  A. A. Handzel,et al.  Biomimetic sound-source localization , 2002 .

[85]  Jonathan Z. Simon,et al.  A Sensorimotor Approach to Sound Localization , 2008, Neural Computation.

[86]  M. E. Altinsoy,et al.  Assessment of Binaural–Proprioceptive Interaction in Human-Machine Interfaces , 2013 .

[87]  AG Armin Kohlrausch,et al.  Binaural Localization and Detection of Speakers in Complex Acoustic Scenes , 2013 .