论文信息 - Detecting object affordances with Convolutional Neural Networks

Detecting object affordances with Convolutional Neural Networks

We present a novel and real-time method to detect object affordances from RGB-D images. Our method trains a deep Convolutional Neural Network (CNN) to learn deep features from the input data in an end-to-end manner. The CNN has an encoder-decoder architecture in order to obtain smooth label predictions. The input data are represented as multiple modalities to let the network learn the features more effectively. Our method sets a new benchmark on detecting object affordances, improving the accuracy by 20% in comparison with the state-of-the-art methods that use hand-designed geometric features. Furthermore, we apply our detection method on a full-size humanoid robot (WALK-MAN) to demonstrate that the robot is able to perform grasps after efficiently detecting the object affordances.

[1] Dieter Fox,et al. Attribute based object identification , 2013, 2013 IEEE International Conference on Robotics and Automation.

[2] Jörn Malzahn,et al. WALK‐MAN: A High‐Performance Humanoid Platform for Realistic Environments , 2017, J. Field Robotics.

[3] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[5] Yiannis Aloimonos,et al. Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Lihi Zelnik-Manor,et al. How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Luc De Raedt,et al. Occluded object search by relational affordances , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9] Michael A. Arbib,et al. Mirror neurons and imitation: A computationally guided review , 2006, Neural Networks.

[10] Markus Vincze,et al. Supervised learning of hidden and non-hidden 0-order affordances and detection in real scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[11] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12] Nikolaos G. Tsagarakis,et al. OpenSoT: A whole-body control library for the compliant humanoid robot COMAN , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13] Herbert Freeman,et al. Determining the minimum-area encasing rectangle for an arbitrary closed curve , 1975, CACM.

[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15] Manuel Lopes,et al. Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[16] Juergen Gall,et al. Weakly Supervised Learning of Affordances , 2016, ArXiv.

[17] Giorgio Metta,et al. YARP: Yet Another Robot Platform , 2006 .

[18] Manuel Lopes,et al. Learning grasping affordances from local visual descriptors , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[19] Nikolaos G. Tsagarakis,et al. Preparatory object reorientation for task-oriented grasping , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[21] Rozzi Stefano,et al. Mirror Neurons and Imitation , 2012 .

[22] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24] Manuel G. Catalano,et al. Adaptive synergies for a humanoid robot hand , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[25] Danica Kragic,et al. Grasping familiar objects using shape context , 2009, 2009 International Conference on Advanced Robotics.

[26] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[27] J. Gibson. The Ecological Approach to Visual Perception , 1979 .

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30] Giorgio Metta,et al. Self-supervised learning of grasp dependent tool affordances on the iCub Humanoid robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[32] Honglak Lee,et al. Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[33] Giorgio Metta,et al. Ranking the good points: A comprehensive method for humanoid robots to grasp unknown objects , 2013, 2013 16th International Conference on Advanced Robotics (ICAR).

[34] Robert Platt,et al. Using Geometry to Detect Grasp Poses in 3D Point Clouds , 2015, ISRR.