On-line object detection: a robotics challenge

Object detection is a fundamental ability for robots interacting within an environment. While stunningly effective, state-of-the-art deep learning methods require huge amounts of labeled images and hours of training which does not favour such scenarios. This work presents a novel pipeline resulting from integrating (Maiettini et al. in 2017 IEEE-RAS 17th international conference on humanoid robotics (Humanoids), 2017) and (Maiettini et al. in 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2018), which naturally trains a robot to detect novel objects in few seconds. Moreover, we report on an extended empirical evaluation of the learning method, justifying that the proposed hybrid architecture is key in leveraging powerful deep representations while maintaining fast training time of large scale Kernel methods. We validate our approach on the Pascal VOC benchmark (Everingham et al. in Int J Comput Vis 88(2): 303–338, 2010), and on a challenging robotic scenario (iCubWorld Transformations (Pasquale et al. in Rob Auton Syst 112:260–281, 2019). We address real world use-cases and show how to tune the method for different speed/accuracy trades-off. Lastly, we discuss limitations and directions for future development.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[4]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[5]  Sven Behnke,et al.  RGB-D object detection and semantic segmentation for autonomous manipulation in clutter , 2018, Int. J. Robotics Res..

[6]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Timothy Patten,et al.  Action Selection for Interactive Object Segmentation in Clutter , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Giorgio Metta,et al.  The design and validation of the R1 personal humanoid , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[12]  Jana Kosecka,et al.  Synthesizing Training Data for Object Detection in Indoor Scenes , 2017, Robotics: Science and Systems.

[13]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Lorenzo Rosasco,et al.  FALKON: An Optimal Large Scale Kernel Method , 2017, NIPS.

[15]  Lorenzo Rosasco,et al.  Speeding-Up Object Detection Training for Robotics with FALKON , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Kah Kay Sung,et al.  Learning and example selection for object and pattern detection , 1995 .

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Lorenzo Rosasco,et al.  Object identification from few examples by improving the invariance of a Deep Convolutional Neural Network , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[22]  John K. Tsotsos,et al.  Revisiting active perception , 2016, Autonomous Robots.

[23]  Lei Zhang,et al.  Towards Human-Machine Cooperation: Self-Supervised Sample Mining for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Wolfram Burgard,et al.  The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..

[25]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[27]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[29]  Giorgio Metta,et al.  YARP: Yet Another Robot Platform , 2006 .

[30]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[31]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[32]  Yuan Wang,et al.  Focal Loss in 3D Object Detection , 2018, IEEE Robotics and Automation Letters.

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[36]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[37]  Giorgio Metta,et al.  Active object recognition on a humanoid robot , 2012, 2012 IEEE International Conference on Robotics and Automation.

[38]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[39]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[40]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Giulio Sandini,et al.  The iCub humanoid robot: An open-systems platform for research in cognitive development , 2010, Neural Networks.

[43]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[44]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Lorenzo Rosasco,et al.  Are we Done with Object Recognition? The iCub robot's Perspective , 2017, Robotics Auton. Syst..

[46]  Lorenzo Rosasco,et al.  Interactive data collection for deep learning object detectors on humanoid robots , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[47]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[48]  Lorenzo Rosasco,et al.  Enabling Depth-Driven Visual Attention on the iCub Humanoid Robot: Instructions for Use and New Perspectives , 2015, Front. Robot. AI.

[49]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[50]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[51]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).