The most frequent opaque formulaic sequences in English-medium college textbooks

Abstract This paper describes an attempt to establish a pedagogically useful list of the most frequent semantically non-transparent formulaic sequences for non-English majors in an EFL context, who need to read the textbooks of their fields in English. The list was compiled from a corpus containing 20 million running words of two hundred college textbooks across forty subject areas. In consideration of opaque formulae in widespread use, we applied a set of screening criteria when using the program Collocate and manual checking. Based on frequency, range, meaningfulness, grammatical well-formedness and semantic non-compositionality, a total of 475 opaque formulaic sequences of 2–5 words were selected and they accounted for approximately 2.08% of the running words in the corpus. The formulae identified were tested against a frequency threshold in the 120 million words of academic texts in the 450-million-token Corpus of Contemporary American English (COCA) to verify if they merit pedagogical attention. As with other wordlists, it is hoped that this phrase list may serve as a reference for EAP teaching.

[1]  W. Grabe,et al.  The Percentage of Words Known in a Text and Reading Comprehension. , 2011 .

[2]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[3]  P. Skehan 语言学习认知法 = A cognitive approach to language learning , 1998 .

[4]  Peter Howarth,et al.  Phraseology and Second Language Proficiency , 1998 .

[5]  Susan Hunston,et al.  Corpora in Applied Linguistics , 2002 .

[6]  Alison Wray,et al.  Formulaic Language and the Lexicon: List of Figures and Tables , 2002 .

[7]  I. Nation How Large a Vocabulary Is Needed for Reading and Listening? , 2006 .

[8]  D. Biber,et al.  If you look at …: Lexical Bundles in University Teaching and Textbooks , 2004 .

[9]  Yu-Hua Chen,et al.  Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach , 2013 .

[10]  Neil Millar,et al.  The Processing of Malformed Formulaic Language , 2011 .

[11]  Kathy Conklin,et al.  Formulaic Sequences: Are They Processed More Quickly than Nonformulaic Language by Native and Nonnative Speakers? , 2008 .

[12]  P. Nation,et al.  Unknown vocabulary density and reading comprehension , 2020 .

[13]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[14]  R. Gibbs,et al.  Psycholinguistic studies on the syntactic behavior of idioms , 1989, Cognitive Psychology.

[15]  Viviana Cortes Lexical bundles in published and student disciplinary writing: Examples from history and biology , 2004 .

[16]  Michael Stubbs,et al.  COLLOCATIONS AND SEMANTIC PROFILES: ON THE CAUSE OF THE TROUBLE WITH QUANTITATIVE STUDIES , 1995 .

[17]  Averil Coxhead A New Academic Word List , 2000 .

[18]  N. Schmitt Researching Vocabulary: A Vocabulary Research Manual , 2010 .

[19]  P. Nation,et al.  Beyond single words: the most frequent collocations in spoken English , 2007 .

[20]  N. Ellis,et al.  An Academic Formulas List: New Methods in Phraseology Research , 2010 .

[21]  Norbert Schmitt,et al.  A Phrasal Expressions List , 2012 .

[22]  James R. Nattinger,et al.  Lexical Phrases and Language Teaching , 1992 .

[23]  K. Hyland,et al.  As can be seen: Lexical bundles and disciplinary variation , 2008 .

[24]  Philip Durrant Investigating the viability of a collocation list for students of English for Academic Purposes. , 2009 .

[25]  Michael Stubbs An example of frequent English phraseology: distributions, structures and functions , 2007 .

[26]  B. Erman,et al.  The idiom principle and the open choice principle , 2000 .

[27]  B. Warren A Model of Idiomaticity , 2005 .

[28]  K. Hyland,et al.  Academic clusters: text patterning in published and postgraduate writing , 2008 .

[29]  V. Murphy,et al.  Effect of Frequency and Idiomaticity on Second Language Reading Comprehension. , 2011 .

[30]  A. Pawley,et al.  Two puzzles for linguistic theory: nativelike selection and nativelike fluency , 2014 .