Fast self-supervised on-line training for object recognition specifically for robotic applications

Today most recognition pipelines are trained at an off-line stage, providing systems with pre-segmented images and predefined objects, or at an on-line stage, which requires a human supervisor to tediously control the learning. Self-Supervised on-line training of recognition pipelines without human intervention is a highly desirable goal, as it allows systems to learn unknown, environment specific objects on-the-fly. We propose a fast and automatic system, which can extract and learn unknown objects with minimal human intervention by employing a two-level pipeline combining the advantages of RGB-D sensors for object extraction and high-resolution cameras for object recognition. Furthermore, we significantly improve recognition results with local features by implementing a novel keypoint orientation scheme, which leads to highly invariant but discriminative object signatures. Using only one image per object for training, our system is able to achieve a recognition rate of 79% for 18 objects, benchmarked on 42 scenes with random poses, scales and occlusion, while only taking 7 seconds for the training. Additionally, we evaluate our orientation scheme on the state-of-the-art 56-object SDU-dataset boosting accuracy for one training view per object by +37% to 78% and peaking at a performance of 98% for 11 training views.

[1]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[2]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[5]  Danica Kragic,et al.  Integrating Active Mobile Robot Object Recognition and SLAM in Natural Environments , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Sos S. Agaian,et al.  Multi-Class Classification Averaging Fusion for Detecting Steganography , 2007, 2007 IEEE International Conference on System of Systems Engineering.

[7]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[8]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[10]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[11]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[13]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[15]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[16]  Motoaki Kawanabe,et al.  A Hybrid Supervised-Unsupervised Vocabulary Generation Algorithm for Visual Concept Recognition , 2010, ACCV.

[17]  Tamim Asfour,et al.  Autonomous acquisition of visual multi-view object representations for object recognition on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[18]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Pejman Iravani,et al.  Visual object classification by robots, using on-line, self-supervised learning , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[21]  Jun Morimoto,et al.  Segmentation and learning of unknown objects through physical interaction , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[22]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[24]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  Kristen Grauman,et al.  Efficient region search for object detection , 2011, CVPR 2011.

[26]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[27]  Rüdiger Dillmann,et al.  The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics , 2012, Int. J. Robotics Res..

[28]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29]  Norbert Krüger,et al.  Multi-view object recognition using view-point invariant shape relations and appearance information , 2013, 2013 IEEE International Conference on Robotics and Automation.