Online Learning Mechanisms for Bayesian Models of Word Segmentation

In recent years, Bayesian models have become increasingly popular as a way of understanding human cognition. Ideal learner Bayesian models assume that cognition can be usefully understood as optimal behavior under uncertainty, a hypothesis that has been supported by a number of modeling studies across various domains (e.g., Griffiths and Tenenbaum, Cognitive Psychology, 51, 354–384, 2005; Xu and Tenenbaum, Psychological Review, 114, 245–272, 2007). The models in these studies aim to explain why humans behave as they do given the task and data they encounter, but typically avoid some questions addressed by more traditional psychological models, such as how the observed behavior is produced given constraints on memory and processing. Here, we use the task of word segmentation as a case study for investigating these questions within a Bayesian framework. We consider some limitations of the infant learner, and develop several online learning algorithms that take these limitations into account. Each algorithm can be viewed as a different method of approximating the same ideal learner. When tested on corpora of English child-directed speech, we find that the constrained learner’s behavior depends non-trivially on how the learner’s limitations are implemented. Interestingly, sometimes biases that are helpful to an ideal learner hinder a constrained learner, and in a few cases, constrained learners perform equivalently or better than the ideal learner. This suggests that the transition from a computational-level solution for acquisition to an algorithmic-level one is not straightforward.

[1]  N. Chater,et al.  Rational models of cognition , 1998 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  Refractor Vision , 2000, The Lancet.

[4]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[5]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[6]  Erik D. Thiessen,et al.  When cues collide: use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. , 2003, Developmental psychology.

[7]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[8]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[9]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[10]  Jia-Wei Hong On connectionist models , 1988 .

[11]  J. Saffran The Use of Predictive Dependencies in Language Learning , 2001 .

[12]  Daniel Swingley,et al.  Statistical clustering and the contents of the infant vocabulary , 2005, Cognitive Psychology.

[13]  Proceedings of the 31st annual Boston University Conference on Language Development , 2007 .

[14]  Margaret M. Fleck Lexicalized Phonotactic Word Segmentation , 2008, ACL.

[15]  Yuval Peres,et al.  Decayed MCMC iltering , 2002, UAI 2002.

[16]  Mary R. Newsome,et al.  The Beginnings of Word Segmentation in English-Learning Infants , 1999, Cognitive Psychology.

[17]  Amanda Seidl,et al.  Infant word segmentation revisited: edge alignment facilitates target extraction. , 2006, Developmental science.

[18]  Ann M. Peters,et al.  The Units of Language Acquisition , 1983 .

[19]  F. Craik,et al.  The Oxford handbook of memory , 2006 .

[20]  Elissa L. Newport,et al.  Maturational Constraints on Language Learning , 1990, Cogn. Sci..

[21]  James L. McClelland,et al.  Letting structure emerge: connectionist and dynamical systems approaches to cognition , 2010, Trends in Cognitive Sciences.

[22]  Yuval Peres,et al.  Decayed MCMC Filtering , 2012, UAI.

[23]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[24]  Pierre Perruchet,et al.  A role for backward transitional probabilities in word segmentation? , 2008, Memory & cognition.

[25]  J. Tenenbaum,et al.  Probabilistic models of cognition: exploring representations and inductive biases , 2010, Trends in Cognitive Sciences.

[26]  Bing Liu,et al.  Connectionist Model , 2009, Encyclopedia of Database Systems.

[27]  Jessica F. Hay,et al.  Learning in reverse: Eight-month-old infants track backward transitional probabilities , 2009, Cognition.

[28]  Scott D. Brown,et al.  Detecting and predicting changes , 2009, Cognitive Psychology.

[29]  Thomas L. Griffiths,et al.  Distributional Cues to Word Boundaries: Context is Important , 2008 .

[30]  N. Ratner Patterns of vowel modification in mother–child speech , 1984, Journal of Child Language.

[31]  Morten H. Christiansen,et al.  Learning to Segment Speech Using Multiple Cues: A Connectionist Model , 1998 .

[32]  Charles Kemp,et al.  Bayesian models of cognition , 2008 .

[33]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[34]  James L. Morgan,et al.  Negative Evidence on Negative Evidence , 2004 .

[35]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[36]  Charles Yang,et al.  Word Segmentation: Quick but not Dirty , 2005 .

[37]  Jeffrey Heinz,et al.  Modeling the contribution of phonotactic cues to the problem of word segmentation. , 2010, Journal of child language.

[38]  Adam N Sanborn,et al.  Rational approximations to rational models: alternative algorithms for category learning. , 2010, Psychological review.

[39]  Angela Baumann,et al.  Nine-Month-Olds' Attention to Sound Similarities in Syllables☆☆☆ , 1999 .

[40]  P. Jusczyk,et al.  Phonotactic and Prosodic Effects on Word Segmentation in Infants , 1999, Cognitive Psychology.

[41]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[42]  Michael C. Frank,et al.  PSYCHOLOGICAL SCIENCE Research Article Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning , 2022 .

[43]  Elizabeth K. Johnson,et al.  Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics , 2001 .

[44]  P. Jusczyk,et al.  Infants’ sensitivity to allophonic cues for word segmentation , 1999, Perception & psychophysics.

[45]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[46]  Mark Johnson,et al.  Nonparametric bayesian models of lexical acquisition , 2007 .

[47]  N. Chater,et al.  Rational models of cognition , 1998 .

[48]  E. Newport,et al.  Computation of Conditional Probability Statistics by 8-Month-Old Infants , 1998 .

[49]  J. Morgan,et al.  Negative Evidence on Negative Evidence. , 1995 .

[50]  Morten H. Christiansen,et al.  Stress changes the representational landscape: evidence from word segmentation , 2005, Cognition.

[51]  Adam N Sanborn,et al.  Exemplar models as a mechanism for performing Bayesian inference , 2010, Psychonomic bulletin & review.

[52]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[53]  Eleanor Olds Batchelder,et al.  Bootstrapping the lexicon: A computational model of infant speech segmentation , 2002, Cognition.

[54]  James L. McClelland Running Head : Letting Structure Emerge Letting Structure Emerge : Connectionist and Dynamical Systems Approaches to Understanding Cognition , 2009 .