Modeling human performance in statistical word segmentation

The ability to discover groupings in continuous stimuli on the basis of distributional information is present across species and across perceptual modalities. We investigate the nature of the computations underlying this ability using statistical word segmentation experiments in which we vary the length of sentences, the amount of exposure, and the number of words in the languages being learned. Although the results are intuitive from the perspective of a language learner (longer sentences, less training, and a larger language all make learning more difficult), standard computational proposals fail to capture several of these results. We describe how probabilistic models of segmentation can be modified to take into account some notion of memory or resource limitations in order to provide a closer match to human performance.

[1]  Thomas L. Griffiths,et al.  A more rational model of categorization , 2006 .

[2]  Danny Jones,et al.  Words in the mind: An introduction to the mental lexicon , 2004, Machine Translation.

[3]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[4]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[5]  P. Jusczyk,et al.  Infants′ Detection of the Sound Patterns of Words in Fluent Speech , 1995, Cognitive Psychology.

[6]  George A. Alvarez,et al.  Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model , 2009, NIPS.

[7]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  Z. Harris,et al.  Methods in structural linguistics. , 1952 .

[10]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[11]  Michael C. Frank,et al.  Modeling Human Performance on Statistical Word Segmentation Tasks , 2007 .

[12]  Jacques Mehler,et al.  The surprising power of statistical learning: When fragment knowledge leads to false memories of unheard words , 2009 .

[13]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[14]  Annie Vinter,et al.  The self-organizing consciousness as an alternative model of the mind , 2002 .

[15]  M. Brent Speech segmentation and word discovery: a computational perspective , 1999, Trends in Cognitive Sciences.

[16]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[17]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[18]  Robert M. Gonyea,et al.  Learning at a Distance : , 2009 .

[19]  Zellig S. Harris,et al.  Methods in structural linguistics. , 1952 .

[20]  van Gerardus Noord,et al.  Special issue: finite state methods in natural language processing , 2003 .

[21]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[22]  E. Newport,et al.  Learning at a distance I. Statistical learning of non-adjacent dependencies , 2004, Cognitive Psychology.

[23]  Aaron C. Courville,et al.  The pigeon as particle filter , 2007, NIPS 2007.

[24]  M. Hauser,et al.  Segmentation of the speech stream in a non-human primate: statistical learning in cotton-top tamarins , 2001, Cognition.

[25]  Scott P. Johnson,et al.  Visual statistical learning in infancy: evidence for a domain general learning mechanism , 2002, Cognition.

[26]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[27]  Timothy F. Brady,et al.  PSYCHOLOGICAL SCIENCE Research Article Statistical Learning Using Real-World Scenes Extracting Categorical Regularities Without Conscious Intent , 2022 .

[28]  Morten H. Christiansen,et al.  Modality-constrained statistical learning of tactile, visual, and auditory sequences. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[29]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[30]  Daniel Swingley,et al.  Statistical clustering and the contents of the infant vocabulary , 2005, Cognitive Psychology.

[31]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[32]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[33]  Richard N Aslin,et al.  Statistical learning of new visual feature combinations by infants , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Peter W. Jusczyk,et al.  How infants begin to extract words from speech , 1999, Trends in Cognitive Sciences.

[35]  Mark Johnson,et al.  Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure , 2008, ACL.

[36]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[37]  Peter Dayan,et al.  Explaining Away in Weight Space , 2000, NIPS.

[38]  Peter M. Duppenthaler Maturational Constraints on Language Learning , 1990 .

[39]  Thomas L. Griffiths,et al.  A Rational Analysis of Rule-Based Concept Learning , 2008, Cogn. Sci..

[40]  N. Cowan The magical number 4 in short-term memory: A reconsideration of mental storage capacity , 2001, Behavioral and Brain Sciences.

[41]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[42]  LouAnn Gerken,et al.  Three Exemplars Allow at Least Some Linguistic Generalizations: Implications for Generalization Mechanisms and Constraints , 2008 .

[43]  Diane Ohala,et al.  Contributions of phonetic token variability and word-type frequency to phonological representations. , 2011, Journal of child language.

[44]  Thomas L. Griffiths,et al.  Modeling the effects of memory on human online sentence processing with particle filters , 2008, NIPS.

[45]  M. Posner,et al.  On the genesis of abstract ideas. , 1968, Journal of experimental psychology.

[46]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[47]  A. Vinter,et al.  PARSER: A Model for Word Segmentation , 1998 .

[48]  J. B. Trobalon,et al.  Statistical computations over a speech stream in a rodent , 2005, Perception & psychophysics.

[49]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[50]  T. Saaty,et al.  Why the magic number seven plus or minus two , 2003 .

[51]  Eugene Galanter,et al.  Handbook of mathematical psychology: I. , 1963 .

[52]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[53]  Neil D. Lawrence,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[54]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[55]  R. Gómez Variability and Detection of Invariant Structure , 2002, Psychological science.

[56]  E. Newport,et al.  WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[57]  Arnaud Rey,et al.  Lexical and Sublexical Units in Speech Perception , 2009, Cogn. Sci..

[58]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[59]  Scott D. Brown,et al.  Detecting and predicting changes , 2009, Cognitive Psychology.

[60]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[61]  D. Luce,et al.  Object Detection and Recognition , 2009, Encyclopedia of Database Systems.

[62]  Elizabeth K. Johnson,et al.  Statistical learning of tone sequences by human infants and adults , 1999, Cognition.

[63]  E. Newport,et al.  Computation of Conditional Probability Statistics by 8-Month-Old Infants , 1998 .

[64]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[65]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[66]  Richard N Aslin,et al.  Bayesian learning of visual chunks by human observers , 2008, Proceedings of the National Academy of Sciences.