A Survey of Knowledge Representation and Retrieval for Learning in Service Robotics

Within the realm of service robotics, researchers have placed a great amount of effort into learning motions and manipulations for task execution by robots. The task of robot learning is very broad, as it involves many tasks such as object detection, action recognition, motion planning, localization, knowledge representation and retrieval, and the intertwining of computer vision and machine learning techniques. In this paper, we focus on how knowledge can be gathered, represented, and reproduced to solve problems as done by researchers in the past decades. We discuss the problems which have existed in robot learning and the solutions, technologies or developments (if any) which have contributed to solving them. Specifically, we look at three broad categories involved in task representation and retrieval for robotics: 1) activity recognition from demonstrations, 2) scene understanding and interpretation, and 3) task representation in robotics datasets and networks. Within each section, we discuss major breakthroughs and how their methods address present issues in robot learning and manipulation.

[1]  Raffaello D'Andrea,et al.  Rapyuta: The RoboEarth Cloud Engine , 2013, 2013 IEEE International Conference on Robotics and Automation.

[2]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[3]  Pat Langley,et al.  A Unified Cognitive Architecture for Physical Agents , 2006, AAAI.

[4]  Mohamed Chetouani,et al.  Perception and human interaction for developmental learning of objects and affordances , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[5]  Li Fei-Fei,et al.  Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.

[6]  Danica Kragic,et al.  Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects , 2008, ECCV.

[7]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[8]  Pedro U. Lima,et al.  Robot task plan representation by Petri nets: modelling, identification, analysis and execution , 2012, Auton. Robots.

[9]  Eren Erdal Aksoy,et al.  Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.

[10]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[11]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[12]  Yi Li,et al.  Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[13]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[14]  Moritz Tenorth,et al.  KnowRob: A knowledge processing infrastructure for cognition-enabled robots , 2013, Int. J. Robotics Res..

[15]  David V. Gealy,et al.  Supplementary File for “Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning” , 2017 .

[16]  Neil T. Dantam,et al.  The Motion Grammar for physical human-robot games , 2011, 2011 IEEE International Conference on Robotics and Automation.

[17]  Christopher W. Geib,et al.  Object Action Complexes as an Interface for Planning and Robot Control , 2006 .

[18]  Michael Beetz,et al.  Cognition-Enabled Autonomous Robot Control for the Realization of Home Chore Task Intelligence , 2012, Proceedings of the IEEE.

[19]  G. Metta,et al.  Exploring affordances and tool use on the iCub , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[20]  G. Rizzolatti,et al.  The mirror-neuron system. , 2004, Annual review of neuroscience.

[21]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[22]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[23]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[24]  Markus Vincze,et al.  AfNet: The Affordance Network , 2012, Asian Conference on Computer Vision.

[25]  Gordon Cheng,et al.  Bootstrapping humanoid robot skills by extracting semantic representations of human-like activities from virtual reality , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[26]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[27]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[28]  Michael A. Arbib,et al.  Mirror neurons and imitation: A computationally guided review , 2006, Neural Networks.

[29]  Hema Swetha Koppula,et al.  RoboBrain: Large-Scale Knowledge Engine for Robots , 2014, ArXiv.

[30]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[31]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[32]  Neil T. Dantam,et al.  The Motion Grammar: Analysis of a Linguistic Method for Robot Control , 2013, IEEE Transactions on Robotics.

[33]  Christopher W. Geib,et al.  Representation and Integration: Combining Robot Control, High-Level Planning, and Action Learning , 2008 .

[34]  Jake K. Aggarwal,et al.  Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me? , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[35]  Luc De Raedt,et al.  Relational affordances for multiple-object manipulation , 2017, Autonomous Robots.

[36]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Plinio Moreno,et al.  On the use of probabilistic relational affordance models for sequential manipulation tasks in robotics , 2013, 2013 IEEE International Conference on Robotics and Automation.

[38]  Tetsunari Inamura,et al.  Bayesian learning of tool affordances based on generalization of functional feature to estimate effects of unseen tools , 2013, Artificial Life and Robotics.

[39]  Hedvig Kjellström,et al.  Recognizing object affordances in terms of spatio-temporal object-object relationships , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[40]  Mark Steedman,et al.  Object-Action Complexes: Grounded abstractions of sensory-motor processes , 2011, Robotics Auton. Syst..

[41]  Dejan Pangercic,et al.  Semantic Object Maps for robotic housework - representation, acquisition and use , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Anthony G. Cohn,et al.  Learning Functional Object-Categories from a Relational Spatio-Temporal Representation , 2008, ECAI.

[43]  Shaogang Ren,et al.  Object-object interaction affordance learning , 2014, Robotics Auton. Syst..

[44]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[45]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[46]  Pedro U. Lima,et al.  Modelling, analysis and execution of robotic tasks using petri nets , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[47]  G. Humphreys,et al.  The paired-object affordance effect. , 2010, Journal of experimental psychology. Human perception and performance.

[48]  G. Rizzolatti,et al.  Mirror neuron: a neurological approach to empathy , 2005 .

[49]  G. Rizzolatti,et al.  Understanding motor events: a neurophysiological study , 2004, Experimental Brain Research.

[50]  A. Newell Unified Theories of Cognition , 1990 .

[51]  Nikolaos G. Tsagarakis,et al.  Detecting object affordances with Convolutional Neural Networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[52]  Luc De Raedt,et al.  Learning relational affordance models for two-arm robots , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Ben Taskar,et al.  Graphical Models in a Nutshell , 2007 .

[54]  Yu Sun,et al.  Functional object-oriented network for manipulation learning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[55]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[56]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[57]  Justus H. Piater,et al.  Bootstrapping paired-object affordance learning with learned single-affordance features , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[58]  Gordon Cheng,et al.  Transferring skills to humanoid robots by extracting semantic representations from observations of human activities , 2017, Artif. Intell..

[59]  Christopher W. Geib,et al.  Title of the Deliverable: Publication about Multi-level Learning Sys- Tem Attachment 1 Attachment 2 a Formal Definition of Object-action Complexes and Examples at Different Levels of the Processing Hierarchy , 2022 .

[60]  M. Brian Blake,et al.  Distributed Service-Oriented Robotics , 2011, IEEE Internet Computing.

[61]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[62]  Gordon Cheng,et al.  Understanding the intention of human activities through semantic perception: observation, understanding and execution on a humanoid robot , 2015, Adv. Robotics.

[63]  Danica Kragic,et al.  Early reactive grasping with second order 3D feature relations , 2007 .

[64]  Giulio Sandini,et al.  The iCub humanoid robot: An open-systems platform for research in cognitive development , 2010, Neural Networks.

[65]  Eren Erdal Aksoy,et al.  Learning the Semantics of Manipulation Action , 2015, ACL.

[66]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[67]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[68]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[69]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Daniele Nardi,et al.  Knowledge Representation for Robots through Human-Robot Interaction , 2013, ICLP 2013.

[71]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[72]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[73]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[74]  Luc De Raedt,et al.  Learning relational affordance models for robots in multi-object manipulation tasks , 2012, 2012 IEEE International Conference on Robotics and Automation.

[75]  Emre Ugur,et al.  Self-discovery of motor primitives and learning grasp affordances , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[76]  Nico Blodow,et al.  Model-based and learned semantic object labeling in 3D point cloud maps of kitchen environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[77]  M. Kiefer,et al.  Action observation can prime visual object recognition , 2009, Experimental Brain Research.

[78]  Yu Sun,et al.  Functional Object-Oriented Network: Construction & Expansion , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[79]  Moritz Tenorth,et al.  RoboEarth Semantic Mapping: A Cloud Enabled Knowledge-Based Approach , 2015, IEEE Transactions on Automation Science and Engineering.

[80]  Sébastien Gérard,et al.  Towards a core ontology for robotics and automation , 2013, Robotics Auton. Syst..

[81]  Moritz Tenorth,et al.  CRAM — A Cognitive Robot Abstract Machine for everyday manipulation in human environments , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[82]  Ji-Yong Lee,et al.  Controlling a humanoid robot in home environment with a cognitive architecture , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[83]  Justus H. Piater,et al.  Emergent structuring of interdependent affordance learning tasks , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[84]  Hedvig Kjellström,et al.  Functional object descriptors for human activity modeling , 2013, 2013 IEEE International Conference on Robotics and Automation.

[85]  Markus Vincze,et al.  AfRob: The affordance network ontology for robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[86]  Manuel Lopes,et al.  Modeling affordances using Bayesian networks , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[87]  Tim Kraska,et al.  Acquiring Object Experiences at Scale , 2010 .

[88]  Javier Civera,et al.  C2TAM: A Cloud framework for cooperative tracking and mapping , 2014, Robotics Auton. Syst..

[89]  V. Gallese Action representaion and the inferior parietal lobule , 2000 .

[90]  Manuela M. Veloso,et al.  Learning environmental knowledge from task-based human-robot dialog , 2013, 2013 IEEE International Conference on Robotics and Automation.

[91]  三嶋 博之 The theory of affordances , 2008 .

[92]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[93]  Daniele Nardi,et al.  Knowledge acquisition through human–robot multimodal interaction , 2013, Intell. Serv. Robotics.

[94]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[95]  Raffaello D'Andrea,et al.  Rapyuta: A Cloud Robotics Platform , 2015, IEEE Transactions on Automation Science and Engineering.

[96]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[97]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[98]  Ji-Yong Lee,et al.  Autonomous task execution of a humanoid robot using a cognitive model , 2010, 2010 IEEE International Conference on Robotics and Biomimetics.

[99]  E Guizzo,et al.  Robots with their heads in the clouds , 2011 .

[100]  Moritz Tenorth,et al.  KNOWROB — knowledge processing for autonomous personal robots , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[101]  Antonio Torralba,et al.  Anticipating the future by watching unlabeled video , 2015, ArXiv.

[102]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[103]  Ugo Pattacini,et al.  Heteroscedastic Regression and Active Learning for Modeling Affordances in Humanoids , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[104]  Alexandre Bernardino,et al.  Affordance based word-to-meaning association , 2009, 2009 IEEE International Conference on Robotics and Automation.

[105]  Alessandro Saffiotti,et al.  Robot task planning using semantic maps , 2008, Robotics Auton. Syst..

[106]  Yiannis Aloimonos,et al.  Detection of Manipulation Action Consequences (MAC) , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[108]  Yiannis Aloimonos,et al.  Manipulation action tree bank: A knowledge resource for humanoids , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[109]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[110]  Juergen Gall,et al.  Weakly Supervised Learning of Affordances , 2016, ArXiv.

[111]  Sven J. Dickinson,et al.  Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[112]  Tamim Asfour,et al.  Action sequence reproduction based on automatic segmentation and Object-Action Complexes , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).