Using Multi-view Recognition and Meta-data Annotation to Guide a Robot's Attention

In the transition from industrial to service robotics, robots will have to deal with increasingly unpredictable and variable environments. We present a system that is able to recognize objects of a certain class in an image and to identify their parts for potential interactions. The method can recognize objects from arbitrary viewpoints and generalizes to instances that have never been observed during training, even if they are partially occluded and appear against cluttered backgrounds. Our approach builds on the implicit shape model of Leibe et al. We extend it to couple recognition to the provision of meta-data useful for a task and to the case of multiple viewpoints by integrating it with the dense multi-view correspondence finder of Ferrari et al. Meta-data can be part labels but also depth estimates, information on material types, or any other pixelwise annotation. We present experimental results on wheelchairs, cars, and motorbikes.

[1]  K. Rockland,et al.  Direct temporal-occipital feedback connections to striate cortex (V1) in the macaque monkey. , 1994, Cerebral cortex.

[2]  Joel L. Davis,et al.  Large-Scale Neuronal Theories of the Brain , 1994 .

[3]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  David Mumford,et al.  Neuronal Architectures for Pattern-theoretic Problems , 1995 .

[5]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[6]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[8]  B. Schiele,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[9]  Feng Han,et al.  Bayesian reconstruction of 3D shapes and scenes from a single image , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[10]  Martial Hebert,et al.  Toward generating labeled maps from color and range data for robot navigation , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[11]  Luc Van Gool,et al.  Fast wide baseline matching for visual navigation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[14]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views , 2006, International Journal of Computer Vision.

[16]  Bernt Schiele,et al.  An Evaluation of Local Shape-Based Features for Pedestrian Detection , 2005, BMVC.

[17]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[18]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[20]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[21]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[23]  Luc Van Gool,et al.  Omnidirectional Vision Based Topological Navigation , 2007, International Journal of Computer Vision.

[24]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Ronen Basri,et al.  Example Based 3D Reconstruction from Single 2D Images , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[26]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[27]  Luc Van Gool,et al.  Edinburgh Research Explorer Simultaneous Object Recognition and Segmentation by Image Exploration , 2022 .

[28]  Luc Van Gool,et al.  3D City Modeling Using Cognitive Loops , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[29]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects , 2006, NIPS.

[30]  Friedrich Fraundorfer,et al.  Topological mapping, localization and navigation using image collections , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[32]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[33]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Sinisa Segvic,et al.  Large scale vision-based navigation without an accurate global reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Paul Newman,et al.  Describing Composite Urban Workspaces , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[39]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[41]  James J. Little,et al.  Curious George: An attentive semantic robot , 2008, Robotics Auton. Syst..

[42]  Xinhua Zhang,et al.  Consistent image analogies using semi-supervised learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Luc Van Gool,et al.  Using Recognition to Guide a Robot's Attention , 2008, Robotics: Science and Systems.

[44]  Martial Hebert,et al.  Directional Associative Markov Network for 3-D Point Cloud Classification , 2008 .

[45]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[46]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Carlo Tomasi Pedestrian Detection , 2009, Encyclopedia of Biometrics.

[48]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[49]  Luc Van Gool,et al.  Shape-from-recognition: Recognition enables meta-data transfer , 2009, Computer Vision and Image Understanding.