Language Evolution by Iterated Learning With Bayesian Agents

Languages are transmitted from person to person and generation to generation via a process of iterated learning: people learn a language from other people who once learned that language themselves. We analyze the consequences of iterated learning for learning algorithms based on the principles of Bayesian inference, assuming that learners compute a posterior distribution over languages by combining a prior (representing their inductive biases) with the evidence provided by linguistic data. We show that when learners sample languages from this posterior distribution, iterated learning converges to a distribution over languages that is determined entirely by the prior. Under these conditions, iterated learning is a form of Gibbs sampling, a widely-used Markov chain Monte Carlo algorithm. The consequences of iterated learning are more complicated when learners choose the language with maximum posterior probability, being affected by both the prior of the learners and the amount of information transmitted between generations. We show that in this case, iterated learning corresponds to another statistical inference algorithm, a variant of the expectation-maximization (EM) algorithm. These results clarify the role of iterated learning in explanations of linguistic universals and provide a formal connection between constraints on language acquisition and the languages that come to be spoken, suggesting that information transmitted via iterated learning will ultimately come to mirror the minds of the learners.

[1]  N. Chater,et al.  Rational models of cognition , 1998 .

[2]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[3]  M A Nowak,et al.  The evolutionary dynamics of grammar acquisition. , 2001, Journal of theoretical biology.

[4]  Partha Niyogi,et al.  The Logical Problem of Language Change , 1995 .

[5]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[6]  D. Vere-Jones Markov Chains , 1972, Nature.

[7]  N. Chater,et al.  An introduction to rational models of cognition , 1998 .

[8]  Nir Vulkan An Economist's Perspective on Probability Matching , 2000 .

[9]  P. Niyogi,et al.  A language learning model for finite parameter spaces , 1996, Cognition.

[10]  Frederick J. Newmeyer,et al.  Explaining language universals , 1990, Journal of Linguistics.

[11]  Simon Kirby,et al.  Innateness and culture in the evolution of language , 2006, Proceedings of the National Academy of Sciences.

[12]  D. G. Rees,et al.  Foundations of Statistics , 1989 .

[13]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  G. Miller,et al.  Cognitive science. , 1981, Science.

[16]  Martin A. Nowak,et al.  The evolution of syntactic communication , 2000, Nature.

[17]  Simon Kirby,et al.  Iterated Learning: A Framework for the Emergence of Language , 2003, Artificial Life.

[18]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[19]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[20]  M. Studdert-Kennedy,et al.  Approaches To The Evolution Of Language: Social And Cognitive Bases , 1998 .

[21]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[23]  References , 1971 .

[24]  N. Chater,et al.  Rational models of cognition , 1998 .

[25]  Refractor Vision , 2000, The Lancet.

[26]  Martin A Nowak,et al.  Language dynamics in finite populations. , 2003, Journal of theoretical biology.

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[28]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[29]  S. Nielsen The stochastic EM algorithm: estimation and asymptotic results , 2000 .

[30]  Gersende Fort,et al.  Convergence of the Monte Carlo expectation maximization for curved exponential families , 2003 .

[31]  Gilles Celeux,et al.  On Stochastic Versions of the EM Algorithm , 1995 .

[32]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[33]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[34]  Nick Chater,et al.  Reconciling simplicity and likelihood principles in perceptual organization. , 1996, Psychological review.

[35]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[36]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and Linear Algebra , 1974 .

[37]  Henry Brighton,et al.  Compositional Syntax From Cultural Transmission , 2002, Artificial Life.

[38]  Peter Cole,et al.  Head movement and long-distance reflexives , 1994 .

[39]  P. Niyogi,et al.  Computational and evolutionary aspects of language , 2002, Nature.

[40]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[41]  Simon Kirby,et al.  Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity , 2001, IEEE Trans. Evol. Comput..

[42]  Edward H. Ip,et al.  On Single Versus Multiple Imputation for a Class of Stochastic Algorithms Estimating Maximum Likelihood , 2002, Comput. Stat..

[43]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[44]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[45]  Jun S. Liu,et al.  Covariance Structure and Convergence Rate of the Gibbs Sampler with Various Scans , 1995 .

[46]  Robert A. Wilson,et al.  Book Reviews: The MIT Encyclopedia of the Cognitive Sciences , 2000, CL.

[47]  Sean H. Rice,et al.  Evolutionary Theory: Mathematical and Conceptual Foundations , 2004 .

[48]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[49]  Ted Briscoe,et al.  Linguistic Evolution through Language Acquisition: Formal and Computational Models. , 2002 .

[50]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[51]  W. T. Maddox,et al.  Relations between prototype, exemplar, and decision bound models of categorization , 1993 .

[52]  G. Celeux,et al.  A stochastic approximation type EM algorithm for the mixture problem , 1992 .

[53]  William J. Stewart,et al.  Numerical Solution of Markov Chains , 1993 .

[54]  S. Kirby,et al.  The emergence of linguistic structure: an overview of the iterated learning model , 2002 .

[55]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[56]  J. Scharf [Language evolution]. , 1973, Gegenbaurs morphologisches Jahrbuch.

[57]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[58]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[59]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[60]  Edward H. Ip,et al.  Stochastic EM: method and application , 1996 .

[61]  Simon Kirby,et al.  From UG to universals: Linguistic adaptation through iterated learning , 2004 .

[62]  C. Robert The Bayesian choice : a decision-theoretic motivation , 1996 .

[63]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[64]  R. Nosofsky Attention and learning processes in the identification and categorization of integral stimuli. , 1987, Journal of experimental psychology. Learning, memory, and cognition.

[65]  Comrie Bernard Language Universals and Linguistic Typology , 1982 .

[66]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .

[67]  C. Bishop The MIT Encyclopedia of the Cognitive Sciences , 1999 .

[68]  Jeffrey S. Rosenthal,et al.  Convergence Rates for Markov Chains , 1995, SIAM Rev..

[69]  F. Ashby,et al.  Categorization as probability density estimation , 1995 .

[70]  Joshua B. Tenenbaum,et al.  Inferring causal networks from observations and interventions , 2003, Cogn. Sci..

[71]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[72]  Carla L. Hudson Kam,et al.  Regularizing Unpredictable Variation: The Roles of Adult and Child Learners in Language Formation and Change , 2005 .

[73]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[74]  N. Chater,et al.  Ten years of the rational analysis of cognition , 1999, Trends in Cognitive Sciences.

[75]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[76]  William K. Estes Approaches to human learning and motivation , 1976 .

[77]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[78]  B. de Boer,et al.  The evolution of language : proceedings of the 8th International Conference (EVOLANG8), Utrecht, Netherlands, 14-17 April 2010 , 2010 .

[79]  Willard Van Orman Quine,et al.  Word and Object , 1960 .

[80]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[81]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[82]  Partha Niyogi,et al.  A Dynamical Systems Model for Language Change , 1994, Complex Syst..

[83]  W. Stolz Universals of Language. , 1968 .

[84]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[85]  Simon Kirby,et al.  Function, Selection, and Innateness: The Emergence of Language Universals , 1999 .

[86]  S. Potter,et al.  Universals of Language , 1966 .

[87]  Partha Niyogi,et al.  Evolutionary Consequences of Language Learning , 1997 .

[88]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[89]  John R. Anderson The Adaptive Character of Thought , 1990 .

[90]  R. Sherman,et al.  Conditions for convergence of Monte Carlo EM sequences with an application to product diffusion modeling , 1999 .

[91]  G. Celeux,et al.  Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions , 1993 .

[92]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[93]  B. Carlin,et al.  On the Convergence of Successive Substitution Sampling , 1992 .

[94]  J. Nerbonne The MIT Encyclopedia of the Cognitive Sciences edited by Robert A. Wilson and Frank C. Keil , 2000 .

[95]  M A Nowak,et al.  Evolution of universal grammar. , 2001, Science.