Theory Acquisition as Stochastic Search

Theory Acquisition as Stochastic Search Tomer D. Ullman, Noah D. Goodman, Joshua B. Tenenbaum {tomeru, ndg, jbt}@mit.edu Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We present an algorithmic model for the development of chil- dren’s intuitive theories within a hierarchical Bayesian frame- work, where theories are described as sets of logical laws generated by a probabilistic context-free grammar. Our algo- rithm performs stochastic search at two levels of abstraction – an outer loop in the space of theories, and an inner loop in the space of explanations or models generated by each the- ory given a particular dataset – in order to discover the theory that best explains the observed data. We show that this model is capable of learning correct theories in several everyday do- mains, and discuss the dynamics of learning in the context of children’s cognitive development. Introduction As children learn about the world, they learn more than just a large stock of specific facts. They organize their knowl- edge into abstract coherent frameworks, or intuitive theo- ries, that guide inference and learning within particular do- mains (Carey, 1985; Wellman & Gelman, 1992). Much re- cent work in computational cognitive modeling has attempted to formalize how intuitive theories are structured, used and acquired from experience (Tenenbaum, Griffiths, & Kemp, 2006), working broadly within a hierarchical Bayesian frame- work shown in Figure 1 (and explained in more detail below). While this program has made progress in certain respects, it has treated the problem of theory acquisition only in a very ideal sense. The child is assumed to have a hypothesis space of possible theories constrained by some “Universal Theory”, and to be able to consider all possible theories in that space, in light of a given body of evidence. Given sufficient evidence, and a suitably constrained hypothesis space of theories, it has been shown that an ideal Bayesian learner can identify the correct theory underlying core domains of knowledge such as causality (Goodman, Ullman, & Tenenbaum, 2009), kinship and other social structures (Kemp, Goodman, & Tenenbaum, 2008). These Bayesian computational analyses have not to date been complemented by working algorithmic models of the search process by which a child can build up an abstract theory, piece by piece, generalizing from experience. Here we describe such an algorithmic model for Bayesian theory acquisition. We show that our algorithm is capable of con- structing correct if highly simplified theories for several ev- eryday domains, and we explore the dynamics of its behavior – how theories can change as the learner’s search process un- folds as well as in response to the quantity and quality of the learner’s observations. At first glance, the dynamics of theory acquisition in child- hood look nothing like the ideal learning analsyes of hierar- chical Bayesian models – and may not even look particularly rational or algorithmic. Different children see different ran- dom fragments of evidence and make their way to adult-like intuitive theories at different paces and along different paths. It seems unlikely that children can simultaneously evaluate many candidate theories at once; on the contrary, they appear to hold just one theory in mind at any time. Transitions be- tween theories appear to be local, myopic, and semi-random, rather than systematic explorations of the hypothesis space. They are prone to backtracking or “two steps forward, one step back”. We suggest that these dynamics are indicative of a stochastic search process, much like the Markov chain Monte Carlo (MCMC) methods that have been proposed for performing approximate probabilistic inference in complex generative models. We show how a search-based learning al- gorithm can begin with little or no knowledge of a domain, and discover the underlying structure that best organizes it by generating new hypotheses and checking them against its current conceptions of the world using a hiearchical Bayesian framework. New hypotheses are accepted probabilistically if they can better account for the observed data, or if they com- press it in some way. Such a search-based learning algorithm is capable of exploring a potentially infinite space of theories, but given enough time and sufficient data it tends to converge on the correct theory – or at least some approximation thereof, corresponding to a small set of abstract predicates and laws. The plan of the paper is as follows. We first introduce our framework for representing and evaluating theories, based on first-order logic and Bayesian inference in a hierarchi- cal probabilistic model that specifies how the theory’s logical structure constrains the data observed by a learner. We then describe our algorithmic approach to theory learning based on MCMC search, using simulated annealing to aid conver- gence. Finally we study the search algorithm’s behavior on two case studies of theory learning in everyday cognitive do- mains: the taxonomic organization of object categories and properties, and a simplified version of magnetism. Formal framework We work with the hierarchical probabilistic model shown in Figure 1, based on those in (Katz, Goodman, Kersting, Kemp, & Tenenbaum, 2008; Kemp et al., 2008). We assume that a domain of cognition is given, comprised of one or more sys- tems, each of which gives rise to some observed data. The learner’s task is to build a theory of the domain: a set of abstract concepts and explanatory laws that together gener- ate a hypothesis space and prior probability distribution over candidate models for systems in that domain. The laws and