Towards binocular active vision in a robot head system

This paper presents the first results of an investigation and pilot study into an active, binocular vision system that combines binocular vergence, object recognition and attention control in a unified framework. The prototype developed is capable of identifying, targeting, verging on and recognizing objects in a highly-cluttered scene without the need for calibration or other knowledge of the camera geometry. This is achieved by implementing all image analysis in a symbolic space without creating explicit pixel-space maps. The system structure is based on the ‘searchlight metaphor’ of biological systems. We present results of a first pilot investigation that yield a maximum vergence error of 6.4 pixels, while seven of nine known objects were recognized in a high-cluttered environment. Finally a “stepping stone” visual search strategy was demonstrated, taking a total of 40 saccades to find two known objects in the workspace, neither of which appeared simultaneously within the Field of View resulting from any individual saccade.

[1]  Jan-Olof Eklundh,et al.  Attending, Foveating and Recognizing Objects in Real World Scenes , 2004 .

[2]  Peter Mowforth,et al.  A head called Richard , 1990, BMVC.

[3]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[4]  Jan-Olof Eklundh,et al.  Recognition of Objects in the Real World from a Systems Perspective , 2005, Künstliche Intell..

[5]  Yakup Genc,et al.  GPU-based Video Feature Tracking And Matching , 2006 .

[6]  Alexandre Bernardino,et al.  Binocular tracking: integrating perception and control , 1999, IEEE Trans. Robotics Autom..

[7]  Giulio Sandini,et al.  Development of auditory-evoked reflexes: Visuo-acoustic cues integration in a binocular head , 2002, Robotics Auton. Syst..

[8]  Sanjeeva Balasuriya,et al.  An Architecture for Object-based Saccade Generation using a Biologically Inspired Self-organised Retina , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[9]  Per-Erik Forssén Learning Saccadic Gaze Control via Motion Prediciton , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[10]  S. J. Marshall,et al.  Human body 3D imaging by speckle texture projection photogrammetry , 2000 .

[11]  Danica Kragic,et al.  Vision for robotic object manipulation in domestic settings , 2005, Robotics Auton. Syst..

[12]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[13]  Jan Paul Siebert,et al.  A fast foveated stereo matcher , 2000 .

[14]  Hans Knutsson,et al.  Preattentive gaze control for robot vision , 1992 .

[15]  Yehezkel Yeshurun,et al.  Cepstral Filtering on a Columnar Image Architecture: A Fast Algorithm for Binocular Stereo Segmentation , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Timothy A. Boyling Active vision for autonomous 3D scene reconstruction , 2002 .

[17]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[18]  Jan Paul Siebert,et al.  An implementation of the scale invariant feature transform in the 2.5d domain , 2007 .

[19]  Demetrios Betsis,et al.  Kinematic Calibration of the Kth Head-eye System Kinematic Calibration of the Kth Head-eye System , 1994 .