Using Recognition to Guide a Robot's Attention

In the transition from industrial to service robotics, robots will have to deal with increasingly unpredictable and variable environments. We present a system that is able to recognize objects of a certain class in an image and to identify their parts for potential interactions. This is demonstrated for object instances that have never been observed during training, and under partial occlusion and against cluttered backgrounds. Our approach builds on the Implicit Shape Model of Leibe and Schiele, and extends it to couple recognition to the provision of meta-data useful for a task. Meta-data can for example consist of part labels or depth estimates. We present experimental results on wheelchairs and cars.

[1]  Luc Van Gool,et al.  Omnidirectional Vision Based Topological Navigation , 2007, International Journal of Computer Vision.

[2]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[3]  Ronen Basri,et al.  Example Based 3D Reconstruction from Single 2D Images , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Feng Han,et al.  Bayesian reconstruction of 3D shapes and scenes from a single image , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[6]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Friedrich Fraundorfer,et al.  Topological mapping, localization and navigation using image collections , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  David Mumford,et al.  Neuronal Architectures for Pattern-theoretic Problems , 1995 .

[10]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[11]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[13]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[14]  Antonio Torralba,et al.  Depth from Familiar Objects: A Hierarchical Model for 3D Scenes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  B. Schiele,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[16]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[18]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Joel L. Davis,et al.  Large-Scale Neuronal Theories of the Brain , 1994 .

[20]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects , 2006, NIPS.

[21]  Sinisa Segvic,et al.  Large scale vision-based navigation without an accurate global reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[23]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.