Web-enhanced object category learning for domestic robots

We present a system architecture for domestic robots that allows them to learn object categories after one sample object was initially learned. We explore the situation in which a human teaches a robot a novel object, and the robot enhances such learning by using a large amount of image data from the Internet. The main goal of this research is to provide a robot with capabilities to enhance its learning while minimizing time and effort required for a human to train a robot. Our active learning approach consists of learning the object name using speech interface, and creating a visual object model by using a depth-based attention model adapted to the robot’s personal space. Given the object’s name (keyword), a large amount of object-related images from two main image sources (Google Images and the LabelMe website) are collected. We deal with the problem of separating good training samples from noisy images by performing two steps: (1) Similar image selection using a Simile Selector Classifier, and (2) non-real image filtering by implementing a variant of Gaussian Discriminant Analysis. After web image selection, object category classifiers are then trained and tested using different objects of the same category. Our experiments demonstrate the effectiveness of our robot learning approach.

[1]  E. Hall,et al.  The Hidden Dimension , 1970 .

[2]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[3]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[4]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[5]  Yoshinori Kobayashi,et al.  Spatial Relation Model for Object Recognition in Human-Robot Interaction , 2009, ICIC.

[6]  T. Duckett,et al.  VOCUS : A Visual Attention System for Object Detection and Goal-directed Search , 2010 .

[7]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kristen Grauman,et al.  Large-scale live active learning: Training object detectors with crawled data and crowds , 2011, CVPR.

[10]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[12]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Andrea Lockerd Thomaz,et al.  Learning from human teachers with Socially Guided Exploration , 2008, 2008 IEEE International Conference on Robotics and Automation.

[14]  Kristen Grauman,et al.  Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yoshinori Kobayashi,et al.  Human robot interaction through simple expressions for object recognition , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[16]  Yi Liu,et al.  Large-scale image annotation using visual synset , 2011, 2011 International Conference on Computer Vision.

[17]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[18]  Horst Bunke,et al.  Off-Line, Handwritten Numeral Recognition by Perturbation Method , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[20]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[21]  Robert B. Fisher,et al.  Object-based visual attention for computer vision , 2003, Artif. Intell..

[22]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[23]  Gang Wang,et al.  OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning , 2007, CVPR.

[24]  Maya Cakmak,et al.  Learning about objects with human teachers , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[26]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.