PHYRE: A New Benchmark for Physical Reasoning

Understanding and reasoning about physics is an important ability of intelligent agents. We develop the PHYRE benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles. We test several modern learning algorithms on PHYRE and find that these algorithms fall short in solving the puzzles efficiently. We expect that PHYRE will encourage the development of novel sample-efficient agents that learn efficient but useful models of physics. For code and to play PHYRE for yourself, please visit this https URL.

[1]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[2]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[3]  A. Caramazza,et al.  Curvilinear motion in the absence of external forces: naive beliefs about the motion of objects. , 1980, Science.

[4]  M. McCloskey,et al.  Naive physics: the curvilinear impetus principle and its role in interactions with moving objects. , 1983, Journal of experimental psychology. Learning, memory, and cognition.

[5]  M. McCloskey,et al.  Intuitive physics: the straight-down belief and its origin. , 1983, Journal of experimental psychology. Learning, memory, and cognition.

[6]  E. Visalberghi,et al.  Lack of comprehension of cause-effect relations in tool-using capuchin monkeys (Cebus apella). , 1994, Journal of comparative psychology.

[7]  E. Visalberghi,et al.  Acting and understanding: Tool use revisited through the minds of capuchin monkeys , 1996 .

[8]  D. Povinelli Folk physics for apes : the chimpanzee's theory of how the world works , 2003 .

[9]  Michael Smith,et al.  Running the Table: An AI for Computer Billiards , 2006, AAAI.

[10]  E. Visalberghi,et al.  Tool use in capuchin monkeys: Distinguishing between performing and understanding , 1989, Primates.

[11]  J. Call,et al.  Tubes, tables and traps: great apes solve two functionally equivalent trap tasks but show no evidence of transfer across tasks , 2008, Animal Cognition.

[12]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[13]  Ernest Davis,et al.  Physical Reasoning , 2008, Handbook of Knowledge Representation.

[14]  A. H. Taylor,et al.  Do New Caledonian crows solve physical problems through causal reasoning? , 2009, Proceedings of the Royal Society B: Biological Sciences.

[15]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[16]  Christopher D. Bird,et al.  Rooks Use Stones to Raise the Water Level to Reach a Floating Worm , 2009, Current Biology.

[17]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[18]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[19]  I. Teschke,et al.  Physical cognition and tool-use: performance of Darwin’s finches in the two-trap tube task , 2011, Animal Cognition.

[20]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[21]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[22]  N. Clayton,et al.  How Do Children Solve Aesop's Fable? , 2012, PloS one.

[23]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[25]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[26]  Alex H. Taylor,et al.  Using the Aesop's Fable Paradigm to Investigate Causal Understanding of Water Displacement by New Caledonian Crows , 2014, PloS one.

[27]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[28]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[33]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34]  Jiajun Wu,et al.  A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding , 2016, CogSci.

[35]  Mario Fritz,et al.  To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction , 2016, ArXiv.

[36]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[37]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[38]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[39]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[41]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Mario Fritz,et al.  Visual Stability Prediction and Its Application to Manipulation , 2016, AAAI Spring Symposia.

[43]  Trevor Darrell,et al.  Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[45]  Yann LeCun,et al.  Prediction Under Uncertainty with Error-Encoding Networks , 2017, ArXiv.

[46]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[47]  Yoshua Bengio,et al.  Generalizable Features From Unsupervised Learning , 2016, ICLR.

[48]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[49]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[50]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[51]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[52]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[53]  Emmanuel Dupoux,et al.  IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning , 2018, ArXiv.

[54]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[55]  David Amos,et al.  Probing Physics Knowledge Using Tools from Developmental Psychology , 2018, ArXiv.

[56]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[57]  Andrea Vedaldi,et al.  ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking , 2018, ECCV.

[58]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[59]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[60]  Joshua B. Tenenbaum,et al.  The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving , 2019, CogSci.

[61]  Chuang Gan,et al.  The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.