Learning physical theories from dynamical scenes

Learning physics from dynamical scenes 2 (ast@mit.edu), Noah Goodman 2 Tomer Ullman 1 (tomeru@mit.edu), Andreas Stuhlm uller (ngoodman@stanford.edu), & Joshua Tenenbaum 1 (jbt@mit.edu) 1 Department of Brain and Cognitive Sciences, MIT, 2 Department of Psychology, Stanford University Figure 1: Illustration of the domain explored in this paper, showing the motion and inter- action of four different pucks moving on a two-dimensional plane governed by latent physical properties and dynamical laws, such as mass, friction, global and pairwise forces. Abstract Humans acquire their most basic physical concepts early in development, but continue to enrich and expand their intu- itive physics throughout life as they are exposed to more and varied dynamical environments. We introduce a hierarchical Bayesian framework to explain how people can learn physi- cal theories across multiple timescales and levels of abstrac- tion. In contrast to previous Bayesian models of theory acqui- sition (Tenenbaum, Kemp, Griffiths, & Goodman, 2011), we work with more expressive probabilistic program representa- tions suitable for learning the forces and properties that govern how objects interact in dynamic scenes unfolding over time. We compare our model and human learners on a challeng- ing task of inferring novel physical laws in microworlds given short movies. People are generally able to perform this task and behave in line with model predictions. Yet they also make systematic errors suggestive of how a top-down Bayesian ap- proach to learning might be complemented by a more bottom- up feature-based approximate inference scheme, to best ex- plain theory learning at an algorithmic level. Keywords: theory learning; intuitive physics; probabilistic in- ference; physical reasoning Introduction People regularly reason about the physical properties of the world around them. Glancing at a book on a table, we can rapidly tell if it is about to fall, how it will slide if pushed, tumble if it falls on a hard floor, sag if pressured, bend if bent. This ability for physical scene understanding begins to develop in infancy, and is suggested as a core component of human cognitive architecture (Spelke & Kinzler, 2007). While some aspects of this capacity are likely innate (Bail- largeon, 2002), learning also occurs at multiple levels from infancy into adulthood. Infants develop notions of contain- ment, stability, and gravitational force over the first few months of life (Baillargeon, 2002). With exposure, young children acquire an intuitive understanding of remote controls and magnets. Most young children and adults quickly adjust to the ’unnatural physics’ of many video games, and astro- nauts can learn to adjust to weightless environments. How, in principle, can people learn intuitive physics from experience? How can they grasp structure at multiple levels, ranging from deep enduring laws acquired early in infancy to the wide spectrum of novel and unfamiliar dynamics that adults encounter and can adapt to? How much data are re- quired, and how are the data brought to bear on candidate theory hypotheses? These are the questions we ask here. We take as a starting point the computational-level view of theory learning as rational statistical inference over hierar- chies of structured representations (Tenenbaum et al., 2011). Previous work in this tradition focused on relatively spare and static logical descriptions of theories and data; for example, a law of magnetism might be represented as ‘if magnet(x) and magnet(y) then attract(x,y)’, and the learner’s data might consist of propositions such as ‘attracts(ob ject a , ob ject b )’ (Kemp, Tenenbaum, Niyogi, & Griffiths, 2010). Here we adopt a more expressive representational framework suitable for learning the force laws and latent properties governing how objects move and interact with each other, given obser- vations of scenes unfolding dynamically over time. We com- pare the performance of an ideal Bayesian learner who can represent dynamical laws and properties with the behavior of human learners asked to infer the novel physics of various mi- croworlds from short movies (e.g., the snapshot shown in Fig. 1). While people are generally able to perform this challeng- ing task, they also make systematic errors which are sugges- tive of how they might use feature-based inference schemes to approximate ideal Bayesian inference. Formalizing theory learning The core of our formal treatment is a hierarchical probabilis- tic generative model for theories (Kemp et al., 2010; Ullman, Goodman, & Tenenbaum, 2012), specialized to the setting of intuitive physical theories (Fig.2). The hierarchy consists of several levels, with more concrete (lower-level) concepts be- ing generated from more abstract versions in the level above, and ultimately bottoming out in data that take the form of dy- namic motion stimuli. Generative knowledge at each level is represented formally using (define ...) statements in Church, a probabilistic programming language (Goodman, Mansinghka, Roy, Bonawitz, & Tenenbaum, 2008). Probabilistic programs are useful for representing knowl- edge with uncertainty (e.g. Stuhlm¨uller & Goodman, 2013). Fig. 2(iii) shows examples of probabilistic definition state- ments within our domain of intuitive physics, using Church. Fig. 2(i) shows the levels associated with these statements. The arrows from one level to the next represent how each level is sampled from the definitions and associated probabil- ity distributions of the level above it. It is not possible to fully detail the technical aspects of the model in the space provided, and so we provide a gen- eral overview. The model is a hierarchy of levels from N (framework level) to 0 (observed data). The top-most level N