论文信息 - The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving

The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving

Many animals, and an increasing number of artificial agents, display sophisticated capabilities to perceive and manipulate objects. But human beings remain distinctive in their capacity for flexible, creative tool use -- using objects in new ways to act on the world, achieve a goal, or solve a problem. Here we introduce the "Tools" game, a simple but challenging domain for studying this behavior in human and artificial agents. Players place objects in a dynamic scene to accomplish a goal that can only be achieved if those objects interact with other scene elements in appropriate ways: for instance, launching, blocking, supporting or tipping them. Only a few attempts are permitted, requiring rapid trial-and-error learning if a solution is not found at first. We propose a "Sample, Simulate, Update" (SSUP) framework for modeling how people solve these challenges, based on exploiting rich world knowledge to sample actions that would lead to successful outcomes, simulate candidate actions before trying them out, and update beliefs about which tools and actions are best in a rapid learning loop. SSUP captures human performance well across 20 levels of the Tools game, and fits significantly better than alternate accounts based on deep reinforcement learning or learning the simulator parameters online. We discuss how the Tools challenge might guide the development of better physical reasoning agents in AI, as well as better accounts of human physical reasoning and tool use.

Joshua B. Tenenbaum | Kevin A. Smith | Kelsey R. Allen | J. Tenenbaum

[1] Tamer Basar,et al. Dual Control Theory , 2001 .

[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[3] A. Yuille,et al. Object perception as Bayesian inference. , 2004, Annual review of psychology.

[4] Alexei A. Efros,et al. Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[5] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[6] Allen Newell,et al. Human Problem Solving. , 1973 .

[7] Joshua B. Tenenbaum,et al. Learning to act by integrating mental simulations and physical experiments , 2018, bioRxiv.

[8] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[9] Erik Talvitie,et al. The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces , 2018, ArXiv.

[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[11] W. H. F. Barnes. The Nature of Explanation , 1944, Nature.

[12] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[13] Jessica B. Hamrick,et al. psiTurk: An open-source framework for conducting replicable behavioral experiments online , 2016, Behavior research methods.

[14] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .

[15] J. Henrich. The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter , 2015 .

[16] Jessica B. Hamrick,et al. Relational inductive bias for physical construction in humans and machines , 2018, CogSci.

[17] Sergey Levine,et al. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18] Patrick van der Smagt,et al. Switching Linear Dynamics for Variational Bayes Filtering , 2019, ICML.

[19] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[20] Lydia M. Hopper,et al. Observational learning of tool use in children: Investigating cultural spread through diffusion chains and learning mechanisms through ghost displays. , 2010, Journal of experimental child psychology.

[21] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[22] Vikash K. Mansinghka,et al. Reconciling intuitive physics and Newtonian mechanics for colliding objects. , 2013, Psychological review.

[23] C. Summerfield,et al. Where Does Value Come From? , 2019, Trends in Cognitive Sciences.

[24] Neil R. Bramley,et al. Intuitive experimentation in the physical world , 2018, Cognitive Psychology.

[25] Silvio Savarese,et al. Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[26] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[28] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[29] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[30] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[31] Pierre-Yves Oudeyer,et al. Modular active curiosity-driven discovery of tool use , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32] E. Menzel. Animal Tool Behavior: The Use and Manufacture of Tools by Animals, Benjamin B. Beck. Garland STPM Press, New York and London (1980), 306, Price £24.50 , 1981 .

[33] Jiajun Wu,et al. Neurocomputational Modeling of Human Physical Scene Understanding , 2018 .

[34] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[35] Dietmar Heinke,et al. Looking for intoolligence: A unified framework for the cognitive study of human tool use and technology. , 2018, The American psychologist.

[36] J. Tenenbaum,et al. LEARNING PHYSICAL DYNAMICS , 2017 .

[37] Sergey Levine,et al. Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[38] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[39] J. Randall Flanagan,et al. Multiple motor memories are learned to control different points on a tool , 2018, Nature Human Behaviour.

[40] Sergey Levine,et al. Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight , 2019, Robotics: Science and Systems.

[41] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[42] Joel Z. Leibo,et al. Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[43] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[44] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[45] W H Warren,et al. The Way the Ball Bounces: Visual and Auditory Perception of Elasticity and Control of the Bounce Pass , 1987, Perception.

[46] Michael I. Jordan,et al. An internal model for sensorimotor integration. , 1995, Science.

[47] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[48] Jessica B. Hamrick,et al. Analogues of mental simulation and imagination in deep learning , 2019, Current Opinion in Behavioral Sciences.

[49] K. Holyoak,et al. Analogical problem solving , 1980, Cognitive Psychology.

[50] Kevin A. Smith,et al. Different Physical Intuitions Exist Between Tasks, Not Domains , 2018, Computational Brain & Behavior.

[51] John Schulman,et al. Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[52] S. Greenberg,et al. The Psychology of Everyday Things , 2012 .

[53] Joshua B. Tenenbaum,et al. A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[54] François Osiurak,et al. Tool use and affordance: Manipulation-based versus reasoning-based approaches. , 2016, Psychological review.

[55] P. Frensch,et al. Complex problem solving : the European perspective , 1995 .

[56] Noah D. Goodman,et al. Learning physical parameters from dynamic scenes , 2018, Cognitive Psychology.

[57] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[58] B. Beck. Animal Tool Behavior: The Use and Manufacture of Tools by Animals , 1980 .

[59] J. Lockman. A perception--action perspective on tool use development. , 2000, Child development.

[60] Sergey Levine,et al. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).