Neuromorphic Audio–Visual Sensor Fusion on a Sound-Localizing Robot

This paper presents the first robotic system featuring audio–visual (AV) sensor fusion with neuromorphic sensors. We combine a pair of silicon cochleae and a silicon retina on a robotic platform to allow the robot to learn sound localization through self motion and visual feedback, using an adaptive ITD-based sound localization algorithm. After training, the robot can localize sound sources (white or pink noise) in a reverberant environment with an RMS error of 4–5° in azimuth. We also investigate the AV source binding problem and an experiment is conducted to test the effectiveness of matching an audio event with a corresponding visual event based on their onset time. Despite the simplicity of this method and a large number of false visual events in the background, a correct match can be made 75% of the time during the experiment.

[1]  Shih-Chii Liu,et al.  AER EAR: A Matched Silicon Cochlea Pair With Address Event Representation Interface , 2007, IEEE Trans. Circuits Syst. I Regul. Pap..

[2]  Rodney J. Douglas,et al.  A pulse-coded communications infrastructure for neuromorphic systems , 1999 .

[3]  Martin Persson,et al.  Multivariate sensor fusion by a neural network model , 2011 .

[4]  José Santos-Victor,et al.  Sound Localization for Humanoid Robots - Building Audio-Motor Maps based on the HRTF , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Carver A. Mead,et al.  Neuromorphic electronic systems , 1990, Proc. IEEE.

[6]  Chu Kiong Loo,et al.  Bayesian Fusion of Auditory and Visual Spatial Cues during Fixation and Saccade in Humanoid Robot , 2008, ICONIP.

[7]  Craig T. Jin,et al.  Adaptive Sound Localization with a Silicon Cochlea Pair , 2010, Front. Neurosci..

[8]  Giacomo Indiveri,et al.  Object Tracking Using Multiple Neuromorphic Vision Sensors , 2004, RoboCup.

[9]  Angel Jiménez-Fernandez,et al.  AER-based robotic closed-loop control system , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[10]  Carver Mead,et al.  Analog VLSI and neural systems , 1989 .

[11]  André van Schaik,et al.  AER EAR: A Matched Silicon Cochlea Pair With Address Event Representation Interface , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  Tobi Delbrück,et al.  Using FPGA for visuo-motor control with a silicon retina and a humanoid robot , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[13]  Noboru Ohnishi,et al.  Self-organization of a sound source localization robot by perceptual cycle , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[14]  Craig T. Jin,et al.  An Address-Event Vision Sensor for Multiple Transient Object Detection , 2007, IEEE Transactions on Biomedical Circuits and Systems.

[15]  Ralph Etienne-Cummings,et al.  AER Auditory Filtering and CPG for Robot Control , 2007, 2007 IEEE International Symposium on Circuits and Systems.