Object-based affordances detection with Convolutional Neural Networks and dense Conditional Random Fields

We present a new method to detect object affordances in real-world scenes using deep Convolutional Neural Networks (CNN), an object detector and dense Conditional Random Fields (CRF). Our system first trains an object detector to generate bounding box candidates from the images. A deep CNN is then used to learn the depth features from these bounding boxes. Finally, these feature maps are post-processed with dense CRF to improve the prediction along class boundaries. The experimental results on our new challenging dataset show that the proposed approach outperforms recent state-of-the-art methods by a substantial margin. Furthermore, from the detected affordances we introduce a grasping method that is robust to noisy data. We demonstrate the effectiveness of our framework on the full-size humanoid robot WALK-MAN using different objects in real-world scenarios.

[1]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[2]  Nikolaos G. Tsagarakis,et al.  OpenSoT: A whole-body control library for the compliant humanoid robot COMAN , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Nikolaos G. Tsagarakis,et al.  Detecting object affordances with Convolutional Neural Networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[6]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Justus H. Piater,et al.  Bottom-up learning of object categories, action effects and logical rules: From continuous manipulative exploration to symbolic planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[8]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[9]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[10]  Loong Fah Cheong,et al.  Segmentation over Detection by Coupled Global and Local Sparse Representations , 2012, ECCV.

[11]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[12]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[15]  Sinisa Todorovic,et al.  A Multi-scale CNN for Affordance Segmentation in RGB Images , 2016, ECCV.

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Florentin Wörgötter,et al.  Bootstrapping the Semantics of Tools: Affordance Analysis of Real World Objects on a Per-part Basis , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Juergen Gall,et al.  Weakly Supervised Learning of Affordances , 2016, ArXiv.

[21]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Robert Platt,et al.  Open world assistive grasping using laser selection , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Jörn Malzahn,et al.  WALK‐MAN: A High‐Performance Humanoid Platform for Realistic Environments , 2017, J. Field Robotics.

[25]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[28]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Iasonas Kokkinos,et al.  Pushing the Boundaries of Boundary Detection using Deep Learning , 2015, ICLR 2016.

[31]  Atabak Dehban,et al.  Denoising auto-encoders for learning of objects and tools affordances in continuous space , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Nikolaos G. Tsagarakis,et al.  Preparatory object reorientation for task-oriented grasping , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).