Detecting object affordances with Convolutional Neural Networks

We present a novel and real-time method to detect object affordances from RGB-D images. Our method trains a deep Convolutional Neural Network (CNN) to learn deep features from the input data in an end-to-end manner. The CNN has an encoder-decoder architecture in order to obtain smooth label predictions. The input data are represented as multiple modalities to let the network learn the features more effectively. Our method sets a new benchmark on detecting object affordances, improving the accuracy by 20% in comparison with the state-of-the-art methods that use hand-designed geometric features. Furthermore, we apply our detection method on a full-size humanoid robot (WALK-MAN) to demonstrate that the robot is able to perform grasps after efficiently detecting the object affordances.

[1]  Dieter Fox,et al.  Attribute based object identification , 2013, 2013 IEEE International Conference on Robotics and Automation.

[2]  Jörn Malzahn,et al.  WALK‐MAN: A High‐Performance Humanoid Platform for Realistic Environments , 2017, J. Field Robotics.

[3]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[5]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Luc De Raedt,et al.  Occluded object search by relational affordances , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Michael A. Arbib,et al.  Mirror neurons and imitation: A computationally guided review , 2006, Neural Networks.

[10]  Markus Vincze,et al.  Supervised learning of hidden and non-hidden 0-order affordances and detection in real scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[11]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Nikolaos G. Tsagarakis,et al.  OpenSoT: A whole-body control library for the compliant humanoid robot COMAN , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Herbert Freeman,et al.  Determining the minimum-area encasing rectangle for an arbitrary closed curve , 1975, CACM.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[16]  Juergen Gall,et al.  Weakly Supervised Learning of Affordances , 2016, ArXiv.

[17]  Giorgio Metta,et al.  YARP: Yet Another Robot Platform , 2006 .

[18]  Manuel Lopes,et al.  Learning grasping affordances from local visual descriptors , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[19]  Nikolaos G. Tsagarakis,et al.  Preparatory object reorientation for task-oriented grasping , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[21]  Rozzi Stefano,et al.  Mirror Neurons and Imitation , 2012 .

[22]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Manuel G. Catalano,et al.  Adaptive synergies for a humanoid robot hand , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[25]  Danica Kragic,et al.  Grasping familiar objects using shape context , 2009, 2009 International Conference on Advanced Robotics.

[26]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[27]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Giorgio Metta,et al.  Self-supervised learning of grasp dependent tool affordances on the iCub Humanoid robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[32]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[33]  Giorgio Metta,et al.  Ranking the good points: A comprehensive method for humanoid robots to grasp unknown objects , 2013, 2013 16th International Conference on Advanced Robotics (ICAR).

[34]  Robert Platt,et al.  Using Geometry to Detect Grasp Poses in 3D Point Clouds , 2015, ISRR.