Phy-Q: A Benchmark for Physical Reasoning

Humans are well-versed in reasoning about the behaviors of physical objects when choosing actions to accomplish tasks, while it remains a major challenge for AI. To facilitate research addressing this problem, we propose a new benchmark that requires an agent to reason about physical scenarios and take an action accordingly. Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios. For each scenario, we create a wide variety of distinct task templates, and we ensure all the task templates within the same scenario can be solved by using one specific physical rule. By having such a design, we evaluate two distinct levels of generalization, namely the local generalization and the broad generalization. We conduct an extensive evaluation with human players, learning agents with varying input types and architectures, and heuristic agents with different strategies. The benchmark gives a Phy-Q (physical reasoning quotient) score that reflects the physical reasoning ability of the agents. Our evaluation shows that 1) all agents fail to reach human performance, and 2) learning agents, even with good local generalization ability, struggle to learn the underlying physical reasoning rules and fail to generalize broadly. We encourage the development of intelligent agents with broad generalization abilities in physical domains. URL: https://github.com/phy-q/benchmark

[1]  R. Baillargeon,et al.  The Development of Young Infants' Intuitions about Support , 1992 .

[2]  N. Clayton,et al.  How Do Children Solve Aesop's Fable? , 2012, PloS one.

[3]  Deva Ramanan,et al.  CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , 2020, ICLR.

[4]  M. McCloskey,et al.  The development of beliefs about falling objects , 1985, Perception & psychophysics.

[5]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[6]  Gary Marcus,et al.  The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence , 2020, ArXiv.

[7]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[8]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[9]  Ernest Davis,et al.  Physical Reasoning , 2008, Handbook of Knowledge Representation.

[10]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11]  A. Meltzoff,et al.  Preschool physics: Using the invisible property of weight in causal reasoning tasks , 2018, PloS one.

[12]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[13]  Christian Wolf,et al.  COPHY: Counterfactual Learning of Physical Dynamics , 2020, ICLR.

[14]  E. Torres-Jara,et al.  Challenges for Robot Manipulation in Human Environments , 2006 .

[15]  Susan J. Hespos,et al.  Infants' Knowledge About Occlusion and Containment Events: A Surprising Discrepancy , 2001, Psychological science.

[16]  Mohit Bansal,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[17]  R. Baillargeon,et al.  An Account of Infants' Physical Reasoning , 2008 .

[18]  Kevin A. Smith,et al.  Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning , 2019, Proceedings of the National Academy of Sciences.

[19]  A. Leslie Spatiotemporal Continuity and the Perception of Causality in Infants , 1984, Perception.

[20]  Nora S. Newcombe,et al.  Infants' coding of location in continuous space. , 1999 .

[21]  Christopher Archibald,et al.  Computational Pool: A New Challenge for Game Theory Pragmatics , 2010, AI Mag..

[22]  R. Baillargeon,et al.  Young infants' reasoning about hidden objects: evidence from violation-of-expectation tasks with test trials only , 2004, Cognition.

[23]  A. Tate A measure of intelligence , 2012 .

[24]  Chuang Gan,et al.  CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.

[25]  Bonnie M. Perdue,et al.  Chimpanzees show some evidence of selectively acquiring information by using tools, making inferences, and evaluating possible outcomes , 2018, PloS one.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[28]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[29]  Peng Zhang,et al.  The Angry Birds AI Competition , 2015, AI Mag..

[30]  Renée Baillargeon,et al.  Explanation-based learning in infancy , 2017, Psychonomic bulletin & review.

[31]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[32]  A. H. Taylor,et al.  Do New Caledonian crows solve physical problems through causal reasoning? , 2009, Proceedings of the Royal Society B: Biological Sciences.

[33]  Rui Prada,et al.  The geometry friends game AI competition , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[34]  Vikash K. Mansinghka,et al.  Reconciling intuitive physics and Newtonian mechanics for colliding objects. , 2013, Psychological review.

[35]  R. Baillargeon,et al.  Object permanence in young infants: further evidence. , 1991, Child development.

[36]  Carmel M. Diezmann,et al.  Identifying and Supporting Spatial Intelligence in Young Children , 2000 .

[37]  Irene Leo,et al.  Perceptual completion in newborn human infants. , 2006, Child development.

[38]  Michotte's heritage in perception and cognition research. , 2006, Acta psychologica.

[39]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[40]  S. Carey,et al.  The perception of causality in infancy. , 2006, Acta psychologica.

[41]  Adam Lerer,et al.  IntPhys 2019: A Benchmark for Visual Intuitive Physics Understanding , 2020 .

[42]  Chuang Gan,et al.  The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.

[43]  Nancy Kanwisher,et al.  Physion: Evaluating Physical Prediction from Vision in Humans and Machines , 2021, ArXiv.

[44]  R. Baillargeon,et al.  Is the Top Object Adequately Supported by the Bottom Object? Young Infants' Understanding of Support Relations , 1990 .

[45]  Joan Bliss,et al.  Force and motion from the beginning , 1994 .

[46]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[47]  Nathan J Emery,et al.  Tool use and physical cognition in birds and mammals , 2009, Current Opinion in Neurobiology.

[48]  Yoshua Bengio,et al.  Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[49]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[50]  Claudio Fabiano Motta Toledo,et al.  A search-based approach for generating Angry Birds levels , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[51]  D. Krathwohl A Revision of Bloom's Taxonomy: An Overview , 2002 .

[52]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[53]  T. Wilcox,et al.  Priming infants to attend to color and pattern information in an individuation task , 2004, Cognition.

[54]  Kevin A. Smith,et al.  OGRE: An Object-based Generalization for Reasoning Environment , 2020 .

[55]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[56]  Alex H. Taylor,et al.  Using the Aesop's Fable Paradigm to Investigate Causal Understanding of Water Displacement by New Caledonian Crows , 2014, PloS one.

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  R. Day,et al.  Perceptual Shape Constancy in Early Infancy , 1973, Perception.

[59]  Charles C. Kemp,et al.  Challenges for robot manipulation in human environments [Grand Challenges of Robotics] , 2007, IEEE Robotics & Automation Magazine.

[60]  Risto Miikkulainen,et al.  Guest Editorial: Physics-Based Simulation Games , 2016, IEEE Trans. Comput. Intell. AI Games.