论文信息 - Reasoning and Generalization in RL: A Tool Use Perspective

Reasoning and Generalization in RL: A Tool Use Perspective

Learning to use tools to solve a variety of tasks is an innate ability of humans and has been observed of animals in the wild. However, the underlying mechanisms that are required to learn to use tools are abstract and widely contested in the literature. In this paper, we study tool use in the context of reinforcement learning and propose a framework for analyzing generalization inspired by a classic study of tool using behavior, the trap-tube task. Recently, it has become common in reinforcement learning to measure generalization performance on a single test set of environments. We instead propose transfers that produce multiple test sets that are used to measure specified types of generalization, inspired by abilities demonstrated by animal and human tool users. The source code to reproduce our experiments is publicly available at this https URL.

[1] I. Teschke,et al. The tale of the finch: adaptive radiation and behavioural flexibility , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2] T. McCormack,et al. Tool Use and Causal Cognition , 2011 .

[3] Silvio Savarese,et al. Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2020, Int. J. Robotics Res..

[4] Christiaan J. J. Paredis,et al. Micro planning for mechanical assembly operations , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[5] A. Maravita,et al. Tools for the body (schema) , 2004, Trends in Cognitive Sciences.

[6] R. Amant,et al. Revisiting the definition of animal tool use , 2008, Animal Behaviour.

[7] Karun B. Shimoga,et al. Robot Grasp Synthesis Algorithms: A Survey , 1996, Int. J. Robotics Res..

[8] Alex H. Taylor,et al. Spontaneous Metatool Use by New Caledonian Crows , 2007, Current Biology.

[9] C. Sammut,et al. An Architecture for Tool Use and Learning in Robots , 2007 .

[10] N C HERRICK,et al. The innocent killers. , 1963, Nursing times.

[11] Claude Sammut,et al. Relational Tool Use Learning by a Robot in a Real and Simulated World , 2016 .

[12] Jean-Claude Latombe,et al. A General Framework for Assembly Planning: The Motion Space Approach , 2000, Algorithmica.

[13] Sergey Levine,et al. Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight , 2019, Robotics: Science and Systems.

[14] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[15] R. Byrne,et al. Animal Tool-Use , 2010, Current Biology.

[16] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[17] Daniel J. Povinelli,et al. The trap-tube problem , 2003 .

[18] T. Breuer,et al. First Observation of Tool Use in Wild Gorillas , 2005, PLoS biology.

[19] Handy Wicaksono. Towards A Relational Approach For Tool Creation By Robots , 2017, IJCAI.

[20] J. Call,et al. Chimpanzees solve the trap problem when the confound of tool-use is removed. , 2009, Journal of experimental psychology. Animal behavior processes.

[21] I. Teschke,et al. Did tool-use evolve with enhanced physical cognitive abilities? , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23] E. Menzel. Animal Tool Behavior: The Use and Manufacture of Tools by Animals, Benjamin B. Beck. Garland STPM Press, New York and London (1980), 306, Price £24.50 , 1981 .

[24] Dawn Xiaodong Song,et al. Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[25] J. Alcock. THE EVOLUTION OF THE USE OF TOOLS BY FEEDING ANIMALS , 1972, Evolution; international journal of organic evolution.

[26] B. Beck. Animal Tool Behavior: The Use and Manufacture of Tools by Animals , 1980 .

[27] T. Arai,et al. Cooperative Manipulation of Objects by Multiple Mobile Robots with Tools * , 1998 .

[28] J. Mann,et al. Cultural transmission of tool use in bottlenose dolphins. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[30] Daniel J. Povinelli,et al. Through a Floppy Tool Darkly , 2011 .

[31] Karl S. Rosengren,et al. The Credible Shrinking Room: Very Young Children's Performance With Symbolic and Nonsymbolic Relations , 1997 .

[32] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[33] J. Call,et al. Causal Knowledge In Corvids, Primates and Children: More Than Meets The Eye? , 2011 .

[34] Sergey Levine,et al. Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[35] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.

[36] Jackie Chappell,et al. Cognitive adaptations for tool-related behaviour in New Caledonian Crows , 2004 .

[37] Alexander Stoytchev,et al. Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[38] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39] E. Visalberghi,et al. Lack of comprehension of cause-effect relations in tool-using capuchin monkeys (Cebus apella). , 1994, Journal of comparative psychology.

[40] Derek C. Penn,et al. Darwin's mistake: Explaining the discontinuity between human and nonhuman minds , 2008, Behavioral and Brain Sciences.

[41] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42] Sergey Levine,et al. Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.