Bridging between Computer and Robot Vision through Data Augmentation: a Case Study on Object Recognition

Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7%, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Ming Liu,et al.  Deep-learning in Mobile Robotics - from Perception to Control Systems: A Survey on Why and Why not , 2016, ArXiv.

[3]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Giorgio Metta,et al.  Weakly supervised strategies for natural object recognition in robotics , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Lorenzo Rosasco,et al.  Object identification from few examples by improving the invariance of a Deep Convolutional Neural Network , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Wolfram Burgard,et al.  A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation , 2016 .

[10]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[11]  Barbara Caputo,et al.  Semantic web-mining and deep vision for lifelong object discovery , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[15]  Gregory D. Hager,et al.  Beyond spatial pooling: Fine-grained representation learning in multiple domains , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.