Combination of foveal and peripheral vision for object recognition and pose estimation

In this paper, we present a real-time vision system that integrates a number of algorithms using monocular and binocular cues to achieve robustness in realistic settings, for tasks such as object recognition, tracking and pose estimation. The system consists of two sets of binocular cameras; a peripheral set for disparity based attention and a foveal one for higher level processes. Thus the conflicting requirements of a wide field of view and high resolution can be overcome. One important property of the system is that the step from task specification through object recognition to pose estimation is completely automatic, combining both appearance and geometric models. Experimental evaluation is performed in a realistic indoor environment with occlusions, clutter, changing lighting and background conditions.

[1]  Kurt Konolige,et al.  Small Vision Systems: Hardware and Implementation , 1998 .

[2]  David W. Murray,et al.  A modular head/eye platform for real-time reactive vision Mechatronics , 1993 .

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  Jan-Olof Eklundh,et al.  Real-Time Epipolar Geometry Estimation of Binocular Stereo Heads , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[6]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Danica Kragic,et al.  Confluence of parameters in model based tracking , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[9]  In-So Kweon,et al.  Robust model-based 3D object recognition by combining feature matching with tracking , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[10]  Jan-Olof Eklundh,et al.  Computational Vision and Active Perception Laboratory, CVAP , 1998 .

[11]  Roberto Cipolla,et al.  Real-Time Tracking of Multiple Articulated Structures in Multiple Views , 2000, ECCV.

[12]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[13]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[14]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Danica Kragic,et al.  Object recognition and pose estimation for robotic manipulation using color cooccurrence histograms , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[16]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[17]  Danica Kragic,et al.  Weak models and cue integration for real-time tracking , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[18]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).