Reward-based learning of optimal cue integration in audio and visual depth estimation

Many real-world applications in robotics have to deal with imprecisions and noise when using only a single information source for computation. Therefore making use of additional cues or sensors is often the method of choice. One examples considered in this paper is depth estimation where multiple visual and auditory cues can be combined to increase precision and robustness of the final estimates. Rather than using a weighted average of the individual estimates we use a reward-based learning scheme to adapt to the given relations amongst the cues. This approach has been shown before to mimic the development of near-optimal cue integration in infants and benefits from using few assumptions about the distribution of inputs. We demonstrate that this approach can substantially improve performance in two different depth estimation systems, one auditory and one visual.

[1]  Jochen Triesch,et al.  Object Recognition with Multiple Feature Types , 1998 .

[2]  B. S. Nelson,et al.  Accuracy of auditory distance and azimuth perception by a passerine bird in natural habitat , 1998, Animal Behaviour.

[3]  Giulio Sandini,et al.  Precise 3D measurements with a high resolution stereo head , 2000, IWISPA 2000. Proceedings of the First International Workshop on Image and Signal Processing and Analysis. in conjunction with 22nd International Conference on Information Technology Interfaces. (IEEE.

[4]  Mubarak Shah,et al.  Object based segmentation of video using color, motion and spatial information , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Marc Naguib,et al.  Estimating the distance to a source of sound: mechanisms and adaptations for long-range communication , 2001, Animal Behaviour.

[6]  Jochen Triesch,et al.  Democratic Integration: Self-Organized Integration of Adaptive Cues , 2001, Neural Computation.

[7]  Jan-Olof Eklundh,et al.  Probabilistic and Voting Approaches to Cue Integration for Figure-Ground Segmentation , 2002, ECCV.

[8]  Darius Burschka,et al.  Advances in Computational Stereo , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Alexander Zelinsky,et al.  Active Vision - Rectification and Depth Mapping , 2004 .

[10]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[11]  Erik Berglund,et al.  Sound source localisation through active audition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Hiroshi Mizoguchi,et al.  Multiple Sound Source Mapping for a Mobile Robot by Self-motion Triangulation , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Hiroshi G. Okuno,et al.  Real-Time Tracking of Multiple Sound Sources by Integration of In-Room and Robot-Embedded Microphone Arrays , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Konrad Paul Kording,et al.  Decision Theory: What "Should" the Nervous System Do? , 2007, Science.

[16]  Gökhan Ince,et al.  Using binaural and spectral cues for azimuth and elevation localization , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Pete R. Jones,et al.  Development of Cue Integration in Human Navigation , 2008, Current Biology.

[18]  W. Richards,et al.  Perception as Bayesian Inference , 2008 .

[19]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[20]  Heiko Wersing,et al.  A biologically motivated visual memory architecture for online learning of objects , 2008, Neural Networks.

[21]  Chen Zhang,et al.  Tracking with Depth-from-Size , 2008, ICONIP.

[22]  David C. Burr,et al.  Young Children Do Not Integrate Visual and Haptic Form Information , 2008, Current Biology.

[23]  Jochen Triesch,et al.  Can reinforcement learning explain the development of causal inference in multisensory integration? , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[24]  Martin Heckmann,et al.  Interactive online multimodal association for internal concept building in humanoids , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[25]  Tobias Rodemann A study on distance estimation in binaural sound localization , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Mark Dunn,et al.  An analysis of depth estimation within interaction range , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Thomas H. Weisswange,et al.  Bayesian Cue Integration as a Developmental Outcome of Reward Mediated Learning , 2011, PloS one.