Error-Driven Stochastic Search for Theories and Concepts

Error-Driven Stochastic Search for Theories and Concepts Owen Lewis (olewis@mit.edu), Santiago Perez (spock@mit.edu), Joshua Tenenbaum (jbt@mit.edu) Department of Brain and Cognitive Science, 43 Vassar Street Cambridge, MA 02139 USA Abstract Bayesian models have been strikingly successful in a wide range of domains. However, the stochastic search algorithms generally used by these models have been criticized for not capturing the error-driven nature of human learning. Here, we incorporate error-driven proposals into a stochastic search al- gorithm and evaluate its performance on concept and theory learning problems. Compared to a model with random propos- als, we find that error-driven search requires fewer proposals and fewer evaluations against labelled data. Keywords: Bayesian inference; algorithmic level; concepts and categories Introduction From infancy, humans impose structure on the world with an impressive array of abstractions, conceptual categorizations and intuitive and formal theories. Characterizing these struc- tures and explaining how they might be learned from data are formidable challenges for both cognitive science and artifi- cial intelligence. Over the past decade, a class of probabilis- tic Bayesian models has emerged as a promising and unifying account of how a learner could acquire concepts and theories (Tenenbaum, Kemp, Griffiths, & Goodman, 2011). These models cast learning as statistical inference: the learner’s goal is to approximate a posterior distribution over the class of structures to be learned, weighing a candidate structure ac- cording to its ability to account for the observed data and its probability according to the learner’s prior beliefs. This prob- abilistic framing allows these models to capture both rule-like and graded aspects of human concepts and theories. Bayesian models are able to discover good abstractions in a number of domains, and in many of these cases they make predictions that agree qualitatively or quantitatively with ex- perimental data from human learners. For many Bayesian models, though, such predictions are confined to the Marr computational level of analysis: they predict which structures a learner will discover or prefer, namely those with high pos- terior probability, but they are largely agnostic about the al- gorithmic details of how the learner makes these discoveries. Internally, most Bayesian models in cognitive domains ap- proximate the target posterior distribution using stochastic search. The most widely used family of search algorithms, which includes the Metropolis Hastings algorithm and simu- lated annealing, has the following iterative propose and ac- cept structure. Given a current candidate structure, the al- gorithm perturbs it, generating a new candidate called a pro- posal. The proposal is then evaluated, and if it is accepted it displaces the previous candidate as the current hypothe- sis. Usually, a proposal is accepted deterministically if it has higher posterior probability than the current hypothesis, and stochastically if it has lower posterior probability. Algorithms of this kind are simple, robust, and effective, but it has been unclear how they relate to the processes of human learning. Recently, though, researchers have started to address this issue (Griffiths, Vul, & Sanborn, 2012). For instance, Ull- man et al. (2012) examine a collection of theory learning tasks, showing that a stochastic search model can qualita- tively reproduce the dynamics of human learning across sev- eral domains. Bonawitz et al. (2011) connect approximate Bayesian inference to earlier algorithmic-level models of hu- man concept learning, and construct sequential approxima- tion schemes that are able to capture aspects of human per- formance on a trial-by-trial basis. Despite these successes, criticisms of stochastic search as a process model of human learning remain. One of the most powerful of these criticisms, made by L. Schulz (2012), hinges on the proposal mechanism by which new candidates are produced. In most existing stochastic search models, in- cluding the process models of Ullman et al. and Bonawitz et al., proposals are made randomly; Schulz argues that human learning is more structured. Specifically, human learning is error-driven: learners make proposals that fix specific defi- ciencies in their current hypothesis. Efficient, error-driven search may hold the answer to an- other criticism of stochastic search. Humans (Feldman, 2000), even young children (Bonawitz et al., 2012), are able to learn remarkably quickly and efficiently, but exist- ing search models are often slow. For instance, (Bonawitz et al., 2012) shows that children are able to learn a theory of magnetism in a matter of minutes, but computational models take many hours to solve similar problems. Relatedly, human learning performance scales remarkably well with problem complexity (Feldman, 2000), while computational models struggle as search spaces become larger. We present a con- crete implementation of error-driven search and show that it can help close this gap. By considering only those hypotheses that fix specific problems, an error-driven learner can avoid irrelevant parts of the search space and converge to a good solution quickly. A rich tradition of error-driven learning models exists in the classical literature on symbolic learning in AI and cogni- tive science. For instance, version space learning (Mitchell, 1978), FOIL (Quinlan, 1990) and explanation-based learning (Mitchell, Keller, & Kedar-Cabelli, 1986) all explore the idea of iteratively modifying hypotheses to account for specific observations. However, despite enjoying some notable suc- cesses, these models lack some of the capabilities of Bayesian models, for instance the ability to account for gradedness in human learning, and for humans’ ability to learn from noisy data.

[1]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[2]  Adam N. Sanborn,et al.  Bridging Levels of Analysis for Probabilistic Models of Cognition , 2012 .

[3]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[4]  Thomas L. Griffiths,et al.  A Rational Analysis of Rule-Based Concept Learning , 2008, Cogn. Sci..

[5]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Noah D. Goodman,et al.  Going beyond the evidence: Abstract laws and preschoolers’ responses to anomalous data , 2008, Cognition.

[8]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[9]  Tom M. Mitchell,et al.  Explanation-Based Generalization: A Unifying View , 1986, Machine Learning.

[10]  Jacob Feldman,et al.  Minimization of Boolean complexity in human concept learning , 2000, Nature.

[11]  Joshua B. Tenenbaum,et al.  Theory Acquisition as Stochastic Search , 2010 .

[12]  Thomas L. Griffiths,et al.  Win-Stay, Lose-Sample: A simple sequential algorithm for approximating Bayesian inference , 2014, Cognitive Psychology.

[13]  Alison Gopnik,et al.  Sticking to the Evidence? A computational and behavioral case study of micro-theory change in the domain of magnetism , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[14]  Yarden Katz,et al.  Modeling Semantic Cognition as Logical Dimensionality Reduction , 2008 .

[15]  G. Miller,et al.  Cognitive science. , 1981, Science.

[16]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[17]  L. Schulz Finding new facts; thinking new thoughts. , 2012, Advances in child development and behavior.