Approaches and databases for online calibration of binaural sound localization for robotic heads

In this paper, we evaluate adaptive sound localization algorithms for robotic heads. To this end we built a 3 degree-of-freedom head with two microphones encased in artificial pinnae (outer ears). The geometry of the head and pinnae induce temporal differences in the sound recorded at each microphone. These differences change with the frequency of the sound, location of the sound, and orientation of the robot in a complex manner. To learn the relationship between these auditory differences and the location of a sound source, we applied machine learning methods to a database of different audio source locations and robot head orientations. Our approach achieves a mean error of 2.5 degrees for azimuth and 11 degrees for elevation for estimating the position of an audio source. The impressive results highlight the benefits of a two-stage regression model to make use of the properties of the artificial pinnae for elevation estimation. In this work, the algorithms were trained using ground truth data provided by a motion capture system. We are currently generalizing the approach so that the training signal is provided online based on a real-time face detection and speech detection system.

[1]  Yoram Singer,et al.  Discriminative Binaural Sound Localization , 2002, NIPS.

[2]  Yoav Freund,et al.  Coordinate-free calibration of an acoustically driven camera pointing system , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[3]  D. M. Green,et al.  Sound localization by human listeners. , 1991, Annual review of psychology.

[4]  Tobi Delbrück,et al.  Event-based 64-channel binaural silicon cochlea with Q enhancement mechanisms , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[5]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[6]  Trevor Darrell,et al.  Learning a Precedence Effect-Like Weighting Function for the Generalized Cross-Correlation Framework , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[8]  Ashutosh Saxena,et al.  Learning sound location from a single microphone , 2009, 2009 IEEE International Conference on Robotics and Automation.