Object affordance detection with relationship-aware network

Object affordance detection, which aims to understand functional attributes of objects, is of great significance for an autonomous robot to achieve a humanoid object manipulation. In this paper, we propose a novel relationship-aware convolutional neural network, which takes the symbiotic relationship between multiple affordances and the combinational relationship between the affordance and objectness into consideration, to predict the most probable affordance label for each pixel in the object. Different from the existing CNN-based methods that rely on separate and intermediate object detection step, our proposed network directly produces the pixel-wise affordance maps from an input image in an end-to-end manner. Specifically, there are three key components in our proposed network: Coord-ASPP module introducing CoordConv in atrous spatial pyramid pooling (ASPP) to refine the feature maps, relationship-aware module linking the affordances and corresponding objects to explore the relationships, and online sequential extreme learning machine auxiliary attention module focusing on individual affordances further to assist relationship-aware module. The experimental results on two public datasets have shown the merits of each module and demonstrated the superiority of our relationship-aware network against the state of the arts.

[1]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[2]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Darwin G. Caldwell,et al.  AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  James M. Rehg,et al.  Affordance Prediction via Learned Object Attributes , 2011 .

[6]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[7]  Sinisa Todorovic,et al.  A Multi-scale CNN for Affordance Segmentation in RGB Images , 2016, ECCV.

[8]  David Zhang,et al.  SVM and ELM: Who Wins? Object Recognition with Deep Convolutional Features from ImageNet , 2015, ArXiv.

[9]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[10]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Junying Hu,et al.  A Deep Neural Network Based on ELM for Semi-supervised Learning of Image Classification , 2017, Neural Processing Letters.

[13]  Ian D. Reid,et al.  SceneCut: Joint Geometric and Object Segmentation for Indoor Scenes , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[15]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[19]  Yonggwan Won,et al.  Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks , 2011, Pattern Recognit. Lett..

[20]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Kenli Li,et al.  An Ensemble CNN2ELM for Age Estimation , 2018, IEEE Transactions on Information Forensics and Security.

[22]  Juergen Gall,et al.  Weakly Supervised Affordance Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nikolaos G. Tsagarakis,et al.  Object-based affordances detection with Convolutional Neural Networks and dense Conditional Random Fields , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Florentin Wörgötter,et al.  Bootstrapping the Semantics of Tools: Affordance Analysis of Real World Objects on a Per-part Basis , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[26]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[27]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Nikolaos G. Tsagarakis,et al.  Detecting object affordances with Convolutional Neural Networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  三嶋 博之 The theory of affordances , 2008 .

[30]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..