The development of formulaic sequences in first and second language writing: Investigating effects of frequency, association, and native norm

Formulaic sequences are recognised as having important roles in language acquisition, processing, fluency, idiomaticity, and instruction. But there is little agreement over their definition and measurement, or on methods of corpus comparison. We argue that replicable research must be grounded upon operational definitions in statistical terms. We adopt an experimental design and apply four different corpus-analytic measures, variously based upon n-gram frequency (Frequency-grams), association (MI-grams), phrase-frames (P-frames), and native norm (items in the Academic Formulas List – AFL-grams), to samples of first and second language writing in order to examine and compare knowledge of formulas in first and second language acquisition as a function of proficiency and language background. We find that these different operationalizations produce different patterns of effect of expertise and L1/L2 status. We consider the implications for corpus design and methods of analysis.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  John M. Swales,et al.  Genre Analysis: English in Academic and Research Settings , 1993 .

[3]  Nick C. Ellis,et al.  Sequencing in SLA , 1996, Studies in Second Language Acquisition.

[4]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[5]  Anthony Paul Cowie,et al.  Phraseology : theory, analysis, and applications , 2000 .

[6]  Blaise Cronin,et al.  Disciplinary Discourses: Social Interactions in Academic Writing , 2002, J. Documentation.

[7]  A. Goldberg,et al.  Incidental verbatim memory for language , 2010, Language and Cognition.

[8]  N. Ellis,et al.  An Academic Formulas List: New Methods in Phraseology Research , 2010 .

[9]  James R. Nattinger,et al.  Lexical Phrases and Language Teaching , 1992 .

[10]  B. Erman,et al.  The idiom principle and the open choice principle , 2000 .

[11]  Colin Bannard,et al.  Stored Word Sequences in Language Learning , 2008, Psychological science.

[12]  C. Westbury,et al.  Processing Advantages of Lexical Bundles: Evidence from Self-Paced Reading and Sentence Recall Tasks. , 2011 .

[13]  John Sinclair,et al.  The phrase, the whole phrase and nothing but the phrase , 2008 .

[14]  Joan L. Bybee,et al.  From Usage to Grammar: The Mind's Response to Repetition , 2007 .

[15]  Philip Durrant,et al.  Are high-frequency collocations psychologically real? Investigating the thesis of collocational priming , 2010 .

[16]  Anne Cutler,et al.  The access and processing of idiomatic expressions , 1979 .

[17]  Nick C. Ellis,et al.  Phraseology in Foreign Language Learning and Teaching , 2008 .

[18]  S. Gries,et al.  Extending collostructional analysis: A corpus-based perspective on `alternations' , 2004 .

[19]  Michael Stubbs,et al.  COLLOCATIONS AND SEMANTIC PROFILES: ON THE CAUSE OF THE TROUBLE WITH QUANTITATIVE STUDIES , 1995 .

[20]  Averil Coxhead A New Academic Word List , 2000 .

[21]  A. Pawley,et al.  Two puzzles for linguistic theory: nativelike selection and nativelike fluency , 2014 .

[22]  Magali Paquot,et al.  Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction , 2009 .

[23]  S. Gries Phraseology and linguistic theory : a brief survey , 2007 .

[24]  Rens Bod,et al.  Sentence memory: Storage vs. computation of frequent sentences , 2001 .

[25]  K. Hyland,et al.  Hedging in scientific research articles , 1998 .

[26]  Sylviane Granger,et al.  Phraseology: An Interdisciplinary Perspective , 2008 .

[27]  Ute Römer,et al.  From student hard drive to web corpus (part 1): the design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP) , 2011 .

[28]  Nick C. Ellis,et al.  Formulaic language in native speakers: Triangulating psycholinguistics, corpus linguistics, and education , 2009 .

[29]  飯島 周 「会話の文法」に関する一考察 : Longman Grammar of Spoken and Written Englishの場合 , 1999 .

[30]  Kathy Conklin,et al.  Formulaic Sequences: Are They Processed More Quickly than Nonformulaic Language by Native and Nonnative Speakers? , 2008 .

[31]  A. Wray Formulaic sequences in second language teaching: principle and practice , 2000 .

[32]  Attapol Khamkhien,et al.  Lexical Priming: A New Theory of Words and Language , 2013 .

[33]  M. Stubbs,et al.  Using recurrent phrases as text-type discriminators: A quantitative method and some findings , 2003 .

[34]  Sylviane Granger,et al.  The International Corpus of Learner English. Version 2. Handbook and CD-Rom , 2009 .

[35]  Richard M. Karp,et al.  The Differencing Method of Set Partitioning , 1983 .

[36]  Nick C. Ellis,et al.  Handbook of Cognitive Linguistics and Second Language Acquisition , 2008 .

[37]  N. Ellis,et al.  Formulaic Language in Native and Second Language Speakers: Psycholinguistics, Corpus Linguistics, and TESOL , 2008 .

[38]  Ute Römer,et al.  Establishing the phraseological profile of a text type: The construction of meaning in academic book reviews , 2010 .

[39]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[40]  J. Sinclair The Search for Units of Meaning , 1996 .

[41]  S. Gries,et al.  Do foreign language learners also have constructions , 2005 .

[42]  N. Snider,et al.  More than words: Frequency effects for multi-word phrases , 2010 .

[43]  Guy Aston,et al.  Corpora and language learners , 2004 .

[44]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[45]  Alison Wray,et al.  Formulaic Language and the Lexicon: List of Figures and Tables , 2002 .

[46]  Stefanie Wulff,et al.  Do foreign language learners also have constructions ? Evidence from priming , sorting , and corpora * , 2005 .

[47]  S. Gries,et al.  Some Proposals towards a More Rigorous Corpus Linguistics , 2006 .

[48]  Nick C. Ellis,et al.  Formulaic Language and Second Language Acquisition: Zipf and the Phrasal Teddy Bear , 2012, Annual Review of Applied Linguistics.

[49]  Nick C. Ellis,et al.  The psycholinguistic reality of collocation and semantic prosody (1): Lexical access , 2009 .

[50]  U. Römer,et al.  Research on advanced student writing across disciplines and levels: Introducing the Michigan Corpus of Upper-level Student Papers , 2012 .

[51]  Michael Oakes,et al.  Statistics for Corpus Linguistics , 1998 .

[52]  D. Biber,et al.  If you look at …: Lexical Bundles in University Teaching and Textbooks , 2004 .

[53]  Mike Scott,et al.  Textual Patterns: Key words and corpus analysis in language education , 2006 .

[54]  L. Lasagna The nature of evidence. , 1972, Triangle; the Sandoz journal of medical science.

[55]  Geoffrey Leech Frequency, corpora and language learning , 2011 .

[56]  Ute Römer,et al.  English in Academia: Does Nativeness Matter? , 2009 .

[57]  Magali Paquot,et al.  A Taste for Corpora. In Honour of Sylviane Granger , 2011 .

[58]  Nick C. Ellis,et al.  Constructions, Chunking, and Connectionism: The Emergence of Second Language Structure , 2008 .

[59]  Ronald Carter,et al.  Trust the Text: Language, Corpus and Discourse , 2004 .

[60]  William D. Raymond,et al.  Probabilistic Relations between Words: Evidence from Reduction in Lexical Production , 2008 .

[61]  Michael Stubbs,et al.  Words and Phrases: Corpus Studies of Lexical Semantics , 2001 .

[62]  Vidas Daudaravicius,et al.  Gravity Counts for the boundaries of collocations , 2004 .

[63]  Winnie Cheng,et al.  From n-gram to skipgram to concgram , 2006 .

[64]  Sylviane Granger-Legrand,et al.  Learner English on computer , 1998 .

[65]  N. Ellis,et al.  Constructing a Second Language: Introduction to the Special Section , 2009 .

[66]  U. Römer The inseparability of lexis and grammar: Corpus linguistic perspectives , 2009 .

[67]  M. Hoey The textual priming of Lexis , 2004 .

[68]  Brian Hayes,et al.  The Easiest Hard Problem , 2002, American Scientist.

[69]  Graeme Trousdale,et al.  The Oxford Handbook of Construction Grammar , 2013 .