The Meta-Science of Adult Statistical Word Segmentation: Part 1

We report the first set of results in a multi-year project to assess the robustness – and the factors promoting robustness – of the adult statistical word segmentation literature. This includes eight total experiments replicating six different experiments. The purpose of these replications is to assess the reproducibility of reported experiments, examine the replicability of their results, and provide more accurate effect size estimates. Reproducibility was mixed, with several papers either lacking crucial details or containing errors in the description of method, making it difficult to ascertain what was done. Replicability was also mixed: although in every instance we confirmed above-chance statistical word segmentation, many theoretically important moderations of that learning failed to replicate. Moreover, learning success was generally much lower than in the original studies. In the General Discussion, we consider whether these differences are due to differences in subject populations, low power in the original studies, or some combination of these and other factors. We also consider whether these findings are likely to generalize to the broader statistical word segmentation literature.

[1]  M. Perone How I Learned to Stop Worrying and Love Replication Failures , 2018, Perspectives on Behavior Science.

[2]  Michael C. Frank,et al.  Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition , 2018, Royal Society Open Science.

[3]  Brian A. Nosek,et al.  Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 , 2018, Nature Human Behaviour.

[4]  José V. Hernández-Conde,et al.  Estimating the Reproducibility of Experimental Philosophy , 2018, Review of Philosophy and Psychology.

[5]  Harry Crane Why 'Redefining Statistical Significance' Will Not Improve Reproducibility and Could Make the Replication Crisis Worse , 2017, 1711.07801.

[6]  Alexander Etz,et al.  Making replication mainstream , 2017, Behavioral and Brain Sciences.

[7]  Joshua K. Hartshorne,et al.  Replication of Finn & Hudson Kam (2008) The curse of knowledge: First language knowledge impairs adult learners’ use of novel statistics for word segmentation, Exp. 1 , 2017 .

[8]  Joshua K. Hartshorne,et al.  In-lab Replication of Saffran, Newport, & Aslin (1996) Word segmentation:The role of distributional cues, Exp. 1 , 2017 .

[9]  Joshua K. Hartshorne,et al.  Replication of Saffran, Johnson, Aslin, & Newport (1999) Statistical learning of tone sequences by human infants and adults, Exp. 2 , 2017 .

[10]  James Ledoux,et al.  Replication of Frank, Goldwater, Griffiths, & Tenenbaum (2010): Modeling human performance in statistical word segmentation, Experiment 1 , 2017 .

[11]  Michael C. Frank,et al.  A Collaborative Approach to Infant Research: Promoting Reproducibility, Best Practices, and Theory-Building. , 2017, Infancy : the official journal of the International Society on Infant Studies.

[12]  Jesse Mu,et al.  Replication of Saffran, Newport, & Aslin (1996) Word segmentation: The role of distributional cues, Exp. 1 , 2017 .

[13]  Erik D. Thiessen,et al.  What's statistical about learning? Insights from modelling statistical learning as a set of memory processes , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  Christina Bergmann,et al.  Quantifying infants' statistical word segmentation: a meta-analysis , 2017, CogSci.

[15]  Brian A. Nosek,et al.  Many Labs 3: Evaluating participant pool quality across the academic semester via replication , 2016 .

[16]  Michael C. Frank,et al.  A Quantitative Synthesis of Early Language Acquisition Using Meta-Analysis , 2016 .

[17]  Wolfgang Stroebe,et al.  Are most published social psychological findings false , 2016 .

[18]  Andrei Cimpian,et al.  The pipeline project : Pre-publication independent replications of a single laboratory's research pipeline , 2016 .

[19]  A. Henik,et al.  The contribution of fish studies to the “number sense” debate , 2016, Behavioral and Brain Sciences.

[20]  Jelte M. Wicherts,et al.  Researchers’ Intuitions About Power in Psychological Research , 2016, Psychological science.

[21]  E. Gibson,et al.  A meta-analysis of syntactic priming in language production , 2016 .

[22]  Gideon Nave,et al.  Evaluating replicability of laboratory experiments in economics , 2016, Science.

[23]  Michael C. Frank,et al.  Response to Comment on “Estimating the reproducibility of psychological science” , 2016, Science.

[24]  Timothy D. Wilson,et al.  Comment on “Estimating the reproducibility of psychological science” , 2016, Science.

[25]  David J. Hauser,et al.  Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants , 2015, Behavior Research Methods.

[26]  Michael C. Frank,et al.  A performance model for early word learning , 2016, CogSci.

[27]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[28]  Joshua de Leeuw,et al.  jsPsych: A JavaScript library for creating behavioral experiments in a Web browser , 2014, Behavior Research Methods.

[29]  Steven V. Rouse,et al.  A reliability analysis of Mechanical Turk data , 2015, Comput. Hum. Behav..

[30]  S. Maxwell,et al.  Is psychology suffering from a replication crisis? What does "failure to replicate" really mean? , 2015, The American psychologist.

[31]  Simine Vazire,et al.  The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power , 2014, PloS one.

[32]  Alex O Holcombe,et al.  An Introduction to Registered Replication Reports at Perspectives on Psychological Science , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[33]  Reginald B. Adams,et al.  Investigating Variation in Replicability: A “Many Labs” Replication Project , 2014 .

[34]  Brian A. Nosek,et al.  Registered Reports A Method to Increase the Credibility of Published Results , 2014 .

[35]  Jeffrey Bowers,et al.  Article Commentary: On the Persistence of Low Power in Psychological Science , 2014, Quarterly journal of experimental psychology.

[36]  Jesse Chandler,et al.  Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers , 2013, Behavior Research Methods.

[37]  D. Wegner,et al.  Six Guidelines for Interesting Research , 2013, Perspectives on psychological science : a journal of the Association for Psychological Science.

[38]  Michael C. Frank,et al.  Zipfian frequency distributions facilitate word segmentation in context , 2013, Cognition.

[39]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[40]  Todd M. Gureckis,et al.  CUNY Academic , 2016 .

[41]  Brian A. Nosek,et al.  Recommendations for Increasing Replicability in Psychology † , 2013 .

[42]  B Tillmann,et al.  Regularity of unit length boosts statistical learning in verbal and nonverbal artificial languages , 2013, Psychonomic bulletin & review.

[43]  Jesse Chandler,et al.  Using Mechanical Turk to Study Clinical Populations , 2013 .

[44]  Morten H. Christiansen,et al.  The need for quantitative methods in syntax and semantics research , 2013 .

[45]  Matthew C. Makel,et al.  Replications in Psychology Research , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[46]  Gregory Francis,et al.  The Psychology of Replication and Replication in Psychology , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[47]  W. Stroebe,et al.  Scientific Misconduct and the Myth of Self-Correction in Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[48]  J. Wicherts,et al.  The Rules of the Game Called Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[49]  J. Ioannidis Why Science Is Not Necessarily Self-Correcting , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[50]  H. Pashler,et al.  Is the Replicability Crisis Overblown? Three Arguments Examined , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[51]  Brian A. Nosek,et al.  Scientific Utopia , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[52]  D. A. Kenny,et al.  Treating stimuli as a random factor in social psychology: a new and comprehensive solution to a pervasive but largely ignored problem. , 2012, Journal of personality and social psychology.

[53]  David G. Rand,et al.  The promise of Mechanical Turk: how online labor markets can help theorists run behavioral experiments. , 2012, Journal of theoretical biology.

[54]  Adam J. Berinsky,et al.  Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk , 2012, Political Analysis.

[55]  Barbara A. Spellman,et al.  Introduction to the Special Section , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[56]  Joshua K. Hartshorne,et al.  Tracking Replicability as a Method of Post-Publication Open Evaluation , 2011, Front. Comput. Neurosci..

[57]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[58]  Axel Cleeremans,et al.  Statistical Learning of Two Artificial Languages Presented Successively: How Conscious? , 2011, Front. Psychology.

[59]  Winny Shen,et al.  Samples in applied psychology: over a decade of research in review. , 2011, The Journal of applied psychology.

[60]  Paulo Ventura,et al.  The relative weight of statistical and prosodic cues in speech segmentation: a matter of language-(In)dependency and of signal quality , 2011 .

[61]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[62]  Alexa R. Romberg,et al.  Statistical learning and language acquisition. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[63]  Mark Steyvers,et al.  Online Learning Mechanisms for Bayesian Models of Word Segmentation , 2010 .

[64]  Michael C. Frank,et al.  Modeling human performance in statistical word segmentation , 2010, Cognition.

[65]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[66]  E. Gibson,et al.  Weak quantitative standards in linguistics research , 2010, Trends in Cognitive Sciences.

[67]  J. Tenenbaum,et al.  Variability, negative evidence, and the acquisition of verb argument constructions. , 2010, Journal of child language.

[68]  Douglas P. Newton,et al.  Quality and Peer Review of Research: An Adjudicating Role for Editors , 2010, Accountability in research.

[69]  J. Henrich,et al.  The weirdest people in the world? , 2010, Behavioral and Brain Sciences.

[70]  Elizabeth K. Johnson,et al.  Testing the limits of statistical learning for word segmentation. , 2010, Developmental science.

[71]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[72]  Caroline F. Rowland,et al.  A Semantics-Based Approach to the "No Negative Evidence" Problem , 2009, Cogn. Sci..

[73]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[74]  D. Fanelli How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data , 2009, PloS one.

[75]  Toben H. Mintz,et al.  Categorizing words using 'frequent frames': what cross-linguistic analyses reveal about distributional acquisition strategies. , 2009, Developmental science.

[76]  H. Pashler,et al.  Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition 1 , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[77]  Jacques Mehler,et al.  Cognitive gains in 7-month-old bilingual infants , 2009, Proceedings of the National Academy of Sciences.

[78]  T. Jaeger,et al.  Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. , 2008, Journal of memory and language.

[79]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[80]  Pierre Perruchet,et al.  A role for backward transitional probabilities in word segmentation? , 2008, Memory & cognition.

[81]  Amy S. Finn,et al.  The curse of knowledge: First language knowledge impairs adult learners’ use of novel statistics for word segmentation , 2008, Cognition.

[82]  Scott Sinnett,et al.  Speech segmentation by statistical learning depends on attention , 2005, Cognition.

[83]  J. Ioannidis Why Most Published Research Findings Are False , 2005 .

[84]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[85]  J. Mehler,et al.  Linguistic Constraints on Statistical Computations , 2005, Psychological science.

[86]  M. Tomasello Constructing a Language , 2005 .

[87]  John A. Johnson Ascertaining the validity of individual protocols from Web-based personality inventories. , 2005 .

[88]  M. Mahoney Publication prejudices: An experimental study of confirmatory bias in the peer review system , 1977, Cognitive Therapy and Research.

[89]  C. F. Bond,et al.  One Hundred Years of Social Psychology Quantitatively Described , 2003 .

[90]  Iris Berent,et al.  Are There Limits to Statistical Learning? , 2003, Science.

[91]  D. Rennie,et al.  Publication bias in editorial decision making. , 2002, JAMA.

[92]  R. Graves,et al.  Statistical Power and Effect Sizes of Clinical Neuropsychology Research , 2001, Journal of clinical and experimental neuropsychology.

[93]  Elizabeth K. Johnson,et al.  Statistical learning of tone sequences by human infants and adults , 1999, Cognition.

[94]  N. Kerr HARKing: Hypothesizing After the Results are Known , 1998, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[95]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[96]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[97]  E. Newport,et al.  WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[98]  Mark A. Mone,et al.  THE PERCEPTIONS AND USAGE OF STATISTICAL POWER IN APPLIED PSYCHOLOGY AND MANAGEMENT RESEARCH , 1996 .

[99]  G. Marcus Negative evidence in language acquisition , 1993, Cognition.

[100]  J. Rossi,et al.  Statistical power of psychological research: what have we gained in 20 years? , 1990, Journal of consulting and clinical psychology.

[101]  J. Hawkins Explaining Language Universals , 1988 .

[102]  M. Bowerman The 'no negative evidence' problem: How do children avoid constructing an overly general grammar? , 1988 .

[103]  Steven Pinker,et al.  Language learnability and language development , 1985 .

[104]  R. Rosenthal The file drawer problem and tolerance for null results , 1979 .

[105]  L. J. Chase,et al.  A statistical power analysis of applied psychological research. , 1976 .

[106]  H. H. Clark The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. , 1973 .

[107]  J. Hayes Cognition and the development of language , 1970 .

[108]  Willard Van Orman Quine,et al.  Word and Object , 1960 .