The effects of N-gram probabilistic measures on the recognition and production of four-word sequences

The present study investigates the processing and production of four-word sequences such as I don’t really know, at the age of, and I think it’s the. Specifically, we investigate the influence of families of probabilistic measures such as unigram, bigram, trigram, and quadgram frequency of occurrence, logarithmic (log) probability of occurrence, and mutual information. Log probability of occurrence emerged as the predominant predictor family in the onset latency analysis, suggesting that recognition is mainly underpinned by competition between a target N-gram and its family members. In contrast, the amount of experience one has with an N-gram (frequency of occurrence) surfaced as the most prominent predictor in production. Further, probabilistic measures tied to trigrams surfaced as the best predictors in the onset latency analysis, while the measures tied to unigrams were most predictive of production durations.Finally, the interactions between probabilistic measures tied to unigrams, bigrams, trigrams, and quadgrams suggest that N-grams of different lengths are processed in parallel in both recognition and production.

[1]  K. Rayner,et al.  Effects of contextual predictability and transitional probability on eye movements during reading. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[2]  L. Cronbach Statistical tests for moderator variables: flaws in analyses recently proposed , 1987 .

[3]  N. Snider,et al.  More than words: Frequency effects for multi-word phrases , 2010 .

[4]  Kara D. Federmeier,et al.  Finding the right word: Hemispheric asymmetries in the use of sentence context information , 2007, Neuropsychologia.

[5]  Mirjam Ernestus,et al.  Lexical frequency and acoustic reduction in spoken Dutch. , 2005, The Journal of the Acoustical Society of America.

[6]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[7]  William D. Marslen-Wilson,et al.  Activation, competition, and frequency in lexical access , 1991 .

[8]  Lee H. Wurm,et al.  Lexical dynamics for low-frequency complex words: A regression study across tasks and modalities , 2007 .

[9]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[10]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[11]  William D. Raymond,et al.  The effects of collocational strength and contextual predictability in lexical production 1 , 1999 .

[12]  W N Venables,et al.  Exegeses on Linear Models , 2000 .

[13]  Nick C. Ellis,et al.  Formulaic language in native speakers: Triangulating psycholinguistics, corpus linguistics, and education , 2009 .

[14]  R. Baayen,et al.  Regular morphologically complex neologisms leave detectable traces in the mental lexicon , 2007 .

[15]  R. H. Baayen,et al.  Frequency effects in compound processing , 2010 .

[16]  Rex B. Kline,et al.  Principles and Practice of Structural Equation Modeling , 1998 .

[17]  R. Harald Baayen,et al.  Analyzing linguistic data: a practical introduction to statistics using R, 1st Edition , 2008 .

[18]  D. Balota,et al.  Visual word recognition of multisyllabic words , 2009 .

[19]  Joan L. Bybee,et al.  From Usage to Grammar: The Mind's Response to Repetition , 2007 .

[20]  Hulin Wu,et al.  Nonparametric regression methods for longitudinal data analysis , 2006 .

[21]  S. Glantz,et al.  Primer of Applied Regression & Analysis of Variance , 1990 .

[22]  M. Allen Understanding Regression Analysis , 1997 .

[23]  Jason M. Brenier,et al.  Predictability Effects on Durations of Content and Function Words in Conversational English , 2009 .

[24]  Joan L. Bybee,et al.  The effect of usage on degrees of constituency: the reduction of don't in English , 1999 .

[25]  Joan Bybee Joan Bybee: Phonology and Language Use , 2004, Phonetica.

[26]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[27]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[28]  M. Taft Recognition of affixed words and the word frequency effect , 1979, Memory & cognition.

[29]  William D. Raymond,et al.  Probabilistic Relations between Words: Evidence from Reduction in Lexical Production , 2008 .

[30]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[31]  Stephen J. Lupker,et al.  Sequential effects in naming: a time-criterion account. , 2001 .

[32]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .