Invariant Feature Mappings for Generalizing Affordance Understanding Using Regularized Metric Learning

This paper presents an approach for learning invariant features for object affordance understanding. One of the major problems for a robotic agent acquiring a deeper understanding of affordances is finding sensory-grounded semantics. Being able to understand what in the representation of an object makes the object afford an action opens up for more efficient manipulation, interchange of objects that visually might not be similar, transfer learning, and robot to human communication. Our approach uses a metric learning algorithm that learns a feature transform that encourages objects that affords the same action to be close in the feature space. We regularize the learning, such that we penalize irrelevant features, allowing the agent to link what in the sensory input caused the object to afford the action. From this, we show how the agent can abstract the affordance and reason about the similarity between different affordances.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Classify , 1894 .

[3]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[4]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[5]  P Jenmalm,et al.  Visual and tactile information about object-curvature control fingertip forces and grasp kinematics in human dexterous manipulation. , 2000, Journal of neurophysiology.

[6]  Alexander Stoytchev,et al.  Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[7]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Lucas Paletta,et al.  Learning Predictive Features in Affordance based Robotic Perception Systems , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[11]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.

[12]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[13]  B. Mesquita,et al.  Adjustment to Chronic Diseases and Terminal Illness Health Psychology : Psychological Adjustment to Chronic Disease , 2006 .

[14]  Maya Cakmak,et al.  To Afford or Not to Afford: A New Formalization of Affordances Toward Affordance-Based Robot Control , 2007, Adapt. Behav..

[15]  Benjamin Kuipers,et al.  Autonomous Development of a Grounded Object Ontology by a Learning Robot , 2007, AAAI.

[16]  J. Sinapov,et al.  Detecting the functional similarities between tools using a hierarchical representation of outcomes , 2008, 2008 7th IEEE International Conference on Development and Learning.

[17]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[18]  L. Barsalou Grounded cognition. , 2008, Annual review of psychology.

[19]  Bernt Schiele,et al.  Functional Object Class Detection Based on Learned Affordance Cues , 2008, ICVS.

[20]  Benjamin Kuipers,et al.  The initial development of object knowledge by a learning robot , 2008, Robotics Auton. Syst..

[21]  Alexei A. Efros,et al.  Recognition by association via learning per-exemplar distances , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Maya Cakmak,et al.  Learning about objects with human teachers , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[24]  Justus H. Piater,et al.  Learning Objects and Grasp Affordances through Autonomous Exploration , 2009, ICVS.

[25]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[26]  Jivko Sinapov,et al.  Toward interactive learning of object categories by a robot: A case study with container and non-container objects , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[27]  Manuel Lopes,et al.  Learning grasping affordances from local visual descriptors , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[28]  Jonathan S. Cant,et al.  Living in a material world: how visual cues to material properties affect the way that we lift objects and perceive their weight. , 2009, Journal of neurophysiology.

[29]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  A. Mubaidin Jordan , 2010, Practical Neurology.

[31]  Danica Kragic,et al.  Exploring affordances in robot grasping through latent structure representation , 2010, ECCV 2010.

[32]  Jivko Sinapov,et al.  How to separate containers from non-containers? a behavior-grounded approach to acoustic object categorization , 2010, 2010 IEEE International Conference on Robotics and Automation.

[33]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[34]  James M. Rehg,et al.  Learning Visual Object Categories for Robot Affordance Prediction , 2010, Int. J. Robotics Res..

[35]  Scott T. Grafton The cognitive neuroscience of prehension: recent developments , 2010, Experimental Brain Research.

[36]  Maya Cakmak,et al.  Towards grounding concepts for transfer in goal learning from demonstration , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[37]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[38]  U. Castiello,et al.  How Objects Are Grasped: The Interplay between Affordances and End-Goals , 2011, PloS one.

[39]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[40]  James M. Rehg,et al.  Affordance Prediction via Learned Object Attributes , 2011 .

[41]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[42]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[43]  Lee A Baugh,et al.  Material evidence: interaction of well-learned priors and sensorimotor memory when lifting objects. , 2012, Journal of neurophysiology.

[44]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45]  Danica Kragic,et al.  Interactive object classification using sensorimotor contingencies , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Jivko Sinapov,et al.  Grounded object individuation by a humanoid robot , 2013, 2013 IEEE International Conference on Robotics and Automation.

[47]  Sinan Kalkan,et al.  The learning of adjectives and nouns from affordance and appearance features , 2013, Adapt. Behav..

[48]  Hedvig Kjellström,et al.  Functional object descriptors for human activity modeling , 2013, 2013 IEEE International Conference on Robotics and Automation.

[49]  Dieter Fox,et al.  Attribute based object identification , 2013, 2013 IEEE International Conference on Robotics and Automation.

[50]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[51]  Sinan Kalkan,et al.  Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision? , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Pierre-Yves Oudeyer,et al.  Object Learning Through Active Exploration , 2014, IEEE Transactions on Autonomous Mental Development.

[53]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[54]  Alexandre Bernardino,et al.  Learning intermediate object affordances: Towards the development of a tool concept , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[55]  Aaron M. Dollar,et al.  Analysis of Human Grasping Behavior: Correlating Tasks, Objects and Grasps , 2014, IEEE Transactions on Haptics.

[56]  Koen V. Hindriks,et al.  Effective transfer learning of affordances for household robots , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[57]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[58]  Giorgio Metta,et al.  Self-supervised learning of grasp dependent tool affordances on the iCub Humanoid robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[59]  Danica Kragic,et al.  Learning Human Priors for Task-Constrained Grasping , 2015, ICVS.

[60]  Andrea Lockerd Thomaz,et al.  Learning object affordances by leveraging the combination of human-guidance and self-exploration , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[61]  Rhodri Cusack,et al.  Disentangling Representations of Object and Grasp Properties in the Human Brain , 2016, The Journal of Neuroscience.

[62]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[63]  Nikolaos G. Tsagarakis,et al.  Detecting object affordances with Convolutional Neural Networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[64]  Atabak Dehban,et al.  Denoising auto-encoders for learning of objects and tools affordances in continuous space , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[65]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[66]  Abhinav Gupta,et al.  Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[67]  Frank Guerin,et al.  Learning how a tool affords by simulating 3D models from the web , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[68]  L. Matthies,et al.  Semantic and Geometric Scene Understanding for Task-oriented Grasping of Novel Objects from a Single View , 2017 .

[69]  Nikolaos G. Tsagarakis,et al.  Object-based affordances detection with Convolutional Neural Networks and dense Conditional Random Fields , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[70]  Roderic A. Grupen,et al.  Associating grasp configurations with hierarchical features in convolutional neural networks , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[71]  Alejandro Bordallo,et al.  Physical symbol grounding and instance learning through demonstration and eye tracking , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[72]  Yiannis Aloimonos,et al.  What can i do around here? Deep functional scene understanding for cognitive robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[73]  Darwin G. Caldwell,et al.  AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[74]  Danica Kragic,et al.  Classify, predict, detect, anticipate and synthesize: Hierarchical recurrent latent variable models for human activity modeling , 2018, ArXiv.

[75]  Kristen Grauman,et al.  Grounded Human-Object Interaction Hotspots From Video , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.