Attending, Foveating and Recognizing Objects in Real World Scenes

Recognition in cluttered real world scenes is a challenging problem. To find a particular object of interest within a reasonable time, a wide field of view is preferable. However, as we will show with practical experiments, robust recognition is easier if the object is foveated and subtends a considerable partof the visual field. In this paper a binocular system able to overcome these two conflicting requirements will be presented. The system consists of two sets of cameras, a wide field pair and a foveal one. From disparities a number of object hypotheses are generated. An attentional process based on hue and 3D size guides the foveal cameras towards the most salient regions. With the object foveated and segmented in 3D, recognition is performed using scale invariant features. The system is fully automised and runs at real-time speed.

[1]  Ian D. Reid,et al.  Saccade and pursuit on an active head/eye platform , 1994, Image Vis. Comput..

[2]  Pascal Fua,et al.  A parallel stereo algorithm that produces dense depth maps and preserves image features , 1993, Machine Vision and Applications.

[3]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[4]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[5]  Kostas Daniilidis,et al.  Decoupling the 3D Motion Space by Fixation , 1996, ECCV.

[6]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[8]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Randal C. Nelson,et al.  Segmentation Propagation during a Camera Saccade , 2002 .

[10]  Andrea Salgian,et al.  A cubist approach to object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  David W. Murray,et al.  A modular head/eye platform for real-time reactive vision Mechatronics , 1993 .

[14]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[15]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Bernt Schiele,et al.  Interleaving Object Categorization and Segmentation , 2006, Cognitive Vision Systems.

[17]  Danica Kragic,et al.  Combination of foveal and peripheral vision for object recognition and pose estimation , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[18]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Kurt Konolige,et al.  Small Vision Systems: Hardware and Implementation , 1998 .

[21]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[22]  Jan-Olof Eklundh,et al.  Real-Time Epipolar Geometry Estimation of Binocular Stereo Heads , 2002, IEEE Trans. Pattern Anal. Mach. Intell..