Probabilistic models of cognition 1 Running head : PROBABILISTIC MODELS OF COGNITION Probabilistic models of cognition : Exploring the laws of thought

Cognitive science aims to reverse-engineer the mind, and many of the engineering challenges the mind faces involve inductive inference. The probabilistic approach to modeling cognition begins with the goal of understanding these inductive problems in computational terms: what makes them difficult, and how they can be solved in principle. Mental processes are then modeled using algorithms for approximately implementing these ideal solutions, and neural processes are viewed as mechanisms for implementing these algorithms in biological hardware. This top-down approach is analogous to historical progressions of theory building in other natural sciences, moving from macro-level functional explanations of observable phenomena to micro-level mechanistic accounts. Typical connectionist models, by contrast, follow a bottom-up approach, starting from an abstract characterization of neural mechanisms and exploring what macro-level functional phenomena might emerge from those mechanisms. We suggest that the top-down approach is likely to yield more rapid progress towards understanding human inductive inference. Probabilistic models of cognition 3 Probabilistic models of cognition: Exploring the laws of thought Introduction: The place of probabilistic models in cognitive science Most approaches to modeling human cognition agree that the mind can be studied on multiple levels. David Marr [1] defined three such levels: a computational level characterizing the problem faced by the mind and how it can be solved in functional terms; an algorithmic level describing the processes that the mind executes to produce this solution; and a hardware level specifying how those processes are instantiated in the brain. Cognitive scientists disagree over whether explanations at all levels are meaningful, and on the order in which levels should be explored. Many connectionists advocate a bottom-up or “mechanism-first” strategy, starting by exploring the problems that neural processes can solve. This often goes with a philosophy of “emergentism” or “eliminativism”: Higher-level explanations do not have independent validity but are at best approximations to the mechanistic truth, referring to emergent phenomena or epiphenomena produced by lower-level mechanisms. In contrast, probabilistic models of cognition pursue a top-down or “function-first” strategy, beginning with abstract principles that allow agents to solve problems posed by the world – the functions that minds perform – and then attempting to reduce these principles to psychological and neural processes. Understanding the lower levels does not eliminate the need for higher-level models, because the lower levels implement the functions specified at higher levels. Many sciences develop theories spanning levels of analysis, and most follow the top-down approach. Scientific progress begins with formal theories that explain observable macro-level phenomena in functional terms, and subsequently relates the macro-level theory to a more mechanistic micro-level. Consider the progressions from the gas laws in thermodynamics to the theory of statistical mechanics, from classical Newtonian mechanics and the physics of radiation and atoms to quantum theory, or from the laws of classical Mendelian genetics to DNA. Each proceeded from a functional level down to a mechanistic level, with the high-level theory a natural limit or special case of the lower-level theory. Despite successful reduction to a lower level, the high-level theory retains lasting value: It continues to be taught, and offers engineers tools for designing systems with specified functions. Explanations at a functional level also have a long history in cognitive science. Boole’s treatise An investigation of the laws of thought [2] presented mathematical logic as a functional characterization of deductive reasoning, and indicated that probability theory might play a similar role for inductive inference (see Box 1). Virtually all modern attempts to engineer human-like artificial intelligence in machines, from the Logic Theory Probabilistic models of cognition 4 Machine [3] to the most successful contemporary paradigms [4], have started with computational principles rather than hardware mechanisms. The potential of probabilistic models of cognition comes from the central role that inductive problems play in cognitive science: Most of cognition, including acquiring a language, a concept, or a causal model, requires uncertain conjecture from partial or noisy information. A probabilistic framework lets us address key questions about these phenomena: How much information is needed? What constraints on learning are necessary? What are the consequences of using different kinds of hypotheses? These are computational-level questions and they are most naturally answered by computational-level theories (see Box 2). While probabilistic models of cognition have largely focused on the abstract characterization of problems and their solutions, we view this as only a starting point. Computational-level models can also guide our study of the algorithmic and hardware levels. A similar top-down strategy was endorsed by Marr [1], and can be seen in the major precursors to contemporary probabilistic models of cognition, such as Shepard’s [5] universal laws and Anderson’s [6] rational analysis. We view this strategy as the most promising way to explore what we see as the central question of human cognition: how people are able to acquire and act on rich knowledge of the world given the limited data that they observe. The rest of this article addresses specific points of debate. Most pressing are questions of representation: How is knowledge of the world represented in the mind and brain? Top-down approaches highlight the need for rich systems of knowledge, and from this standpoint, a major advantage of probabilistic models is their ability to incorporate structured representations. We illustrate this point with examples of structured probabilistic models of inductive learning and reasoning, and discuss how their representational claims should be interpreted. We then consider how probabilistic models can guide attempts to build cognitive theories spanning levels of analysis. We close by considering the relative merits of top-down and bottom-up approaches to studying cognition. Structured representations and probabilistic models A probabilistic model starts with a formal characterization of an inductive problem, specifying the hypotheses under consideration, the relationship between these hypotheses and observable data, and the prior probability of each hypothesis. Probabilistic models therefore provide a transparent account of the assumptions that allow a problem to be solved, and make it easy to explore the consequences of different assumptions. Hypotheses can take any form, from weights in a neural network [7, 8] to structured symbolic representations, as long as they specify a probability distribution over observable data. The approach makes no a priori commitment to any class of representations, but provides a framework for evaluating different representational proposals. Probabilistic models of cognition 5 The flexibility of probabilistic models allows them to combine the strengths of traditional approaches to modeling cognition. Approaches based on logic, such as production systems [9], can exploit the combinatorics of discrete representations to define infinite hypothesis spaces that are generated from a few basic principles, and that capture apparent properties of human thought, such as compositionality and recursion. However, modeling inductive inferences, such as learning and reasoning under uncertainty, is hard. In contrast, connectionist models support statistical learning and can express aspects of uncertainty through graded representations, but are challenged by inferences that seem better characterized in terms of symbol manipulation over structured representations [10, 11]. Probabilistic models offer a synthesis of both approaches by allowing statistical inference over symbolic, structured representations (see Figure 1). The ability to work with different kinds of representations allows the probabilistic approach to give a unified account of inferences about qualitatively different domains. Property induction provides a simple example. Consider a problem, for instance, where participants learn that horses, cows, and dolphins have a certain property then must decide whether all mammals are likely to have this property. Some researchers have proposed that inferences about novel properties of animals are supported by tree-structured representations [12], but others suggest that the underlying mental representations are closer to continuous spaces [13]. One way to resolve this debate is to define a probabilistic framework that can use either kind of representation, and to see which representation best explains human inferences [14]. The results in Figure 2a suggest that a tree structure is the better of these two alternatives. Although knowledge about relationships between species is better captured by a tree than a low-dimensional space, tree structures are not appropriate for every domain. Figure 2b shows results from a study where the items are cities and participants are told, for example, that a certain kind of Native American artifact is found near Houston, Durham, and Orlando, and then asked whether this artifact is likely to be found near all major American cities. Here the data suggest that knowledge about spatial relationships between cities is better captured by a low-dimensional space than a tree. A probabilistic framework can explain both patterns of inference for both animals and cities by incorporating appropriate representations for each domain. A probabilistic analysis also indicates how people might acquire qualitatively different representations for different domains [14] (see Figure 2c). Although simple representations such

[1]  G. Boole An Investigation of the Laws of Thought: On which are founded the mathematical theories of logic and probabilities , 2007 .

[2]  Allen Newell,et al.  The logic theory machine-A complex information processing system , 1956, IRE Trans. Inf. Theory.

[3]  L. Rips Inductive judgments about natural categories. , 1975 .

[4]  Douglas L. Medin,et al.  Context theory of classification learning. , 1978 .

[5]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[6]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[7]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[8]  John R. Anderson The Adaptive Character of Thought , 1990 .

[9]  John R. Anderson,et al.  Rules of the Mind , 1993 .

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[12]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[13]  P. Cheng From covariation to causation: A causal power theory. , 1997 .

[14]  S. Atran Folk biology and the anthropology of science: Cognitive universals and cultural particulars , 1998, Behavioral and Brain Sciences.

[15]  Joshua B. Tenenbaum,et al.  Rules and Similarity in Concept Learning , 1999, NIPS.

[16]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[17]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[18]  Refractor Vision , 2000, The Lancet.

[19]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[20]  Nick Chater,et al.  The Generalized Universal Law of Generalization , 2001, ArXiv.

[21]  Linda B. Smith,et al.  Object name Learning Provides On-the-Job Training for Attention , 2002, Psychological science.

[22]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[23]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[24]  James L. McClelland,et al.  Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[25]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[26]  Feng Han,et al.  Bottom-up/top-down image parsing by attribute graph grammar , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[28]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[29]  Linda B. Smith,et al.  From the lexicon to expectations about kinds: a role for associative learning. , 2005, Psychological review.

[30]  J. Tenenbaum,et al.  Optimal Predictions in Everyday Cognition , 2006, Psychological science.

[31]  Thomas L. Griffiths,et al.  A more rational model of categorization , 2006 .

[32]  Christopher D. Manning,et al.  Probabilistic models of language processing and acquisition , 2006, Trends in Cognitive Sciences.

[33]  J. Tenenbaum,et al.  Poverty of the Stimulus? A Rational Approach , 2006 .

[34]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[35]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[36]  Paul M. B. Vitányi,et al.  ‘Ideal learning’ of natural language: Positive results about learning from positive evidence , 2007 .

[37]  Aaron C. Courville,et al.  The pigeon as particle filter , 2007, NIPS 2007.

[38]  A. Gopnik,et al.  Causal learning : psychology, philosophy, and computation , 2007 .

[39]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[40]  J. Tenenbaum,et al.  Sensitivity to Sampling in Bayesian Word Learning We Thank Members of the Ubc Baby Cognition Lab for Their Help with Data Collection, And , 2022 .

[41]  J. Tenenbaum,et al.  Bayesian Special Section Learning Overhypotheses with Hierarchical Bayesian Models , 2022 .

[42]  J. Tenenbaum,et al.  Two proposals for causal grammars , 2007 .

[43]  Noah D. Goodman,et al.  Learning Causal Schemata , 2007 .

[44]  Thomas L. Griffiths,et al.  Modeling the effects of memory on human online sentence processing with particle filters , 2008, NIPS.

[45]  Wei Ji Ma,et al.  Spiking networks for Bayesian inference and choice , 2008, Current Opinion in Neurobiology.

[46]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[47]  A. Yuille,et al.  Bayesian generic priors for causal learning. , 2008, Psychological review.

[48]  Naomi H. Feldman,et al.  Performing Bayesian Inference with Exemplar Models , 2008 .

[49]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[50]  Scott D. Brown,et al.  Detecting and predicting changes , 2009, Cognitive Psychology.

[51]  Michael C. Frank,et al.  PSYCHOLOGICAL SCIENCE Research Article Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning , 2022 .

[52]  J. Tenenbaum,et al.  Learning to learn categories , 2009 .

[53]  J. Tenenbaum,et al.  Structured statistical models of inductive reasoning. , 2009, Psychological review.