The creation and application of a large-scale corpus-based academic multi-word unit list

Abstract This paper outlines a project involving the construction of a corpus-based list which provides a large-scale selection of multi-word units that occur in academic English. Using the most up-to-date, reliable methods, the goal was to produce a large-scale resource which could either be studied directly or used as a reference for practitioners to create further resources. The paper details the procedures used to generate this academic multi-word unit list, explains why specific decisions were made to identify useful items, and discusses the resulting resource. Comparisons will be made between the list created and currently existing lists, and also between the characteristics of the list created versus characteristics of high-frequency general English word lists. Finally, applications of this free resource for English practitioners and students will be suggested.

[1]  Philip Durrant Formulaic language in English for Academic Purposes , 2018, Understanding Formulaic Language.

[2]  P. Nation,et al.  How many idioms are there in English? , 2020 .

[3]  Sylviane Granger,et al.  Quantifying the development of phraseological competence in L2 English writing: An automated approach , 2014 .

[4]  Graeme D. Kennedy,et al.  Collocations: Where Grammar and Vocabulary Teaching Meet. , 1990 .

[5]  Lei Lei,et al.  The academic English collocation list , 2018, International Journal of Corpus Linguistics.

[6]  F. Boers,et al.  Formulaic sequences and perceived oral proficiency: putting a Lexical Approach to the test , 2006 .

[7]  Kathy Conklin,et al.  Formulaic Sequences: Are They Processed More Quickly than Nonformulaic Language by Native and Nonnative Speakers? , 2008 .

[8]  Norbert Schmitt,et al.  A Phrasal Expressions List , 2012 .

[9]  P. Bogaards LEXICAL UNITS AND THE LEARNING OF FOREIGN LANGUAGE VOCABULARY , 2001, Studies in Second Language Acquisition.

[10]  I.S.P. Nation,et al.  Learning Vocabulary in Another Language , 2001 .

[11]  N. Ellis,et al.  An Academic Formulas List: New Methods in Phraseology Research , 2010 .

[12]  J. Carlin,et al.  Bias, prevalence and kappa. , 1993, Journal of clinical epidemiology.

[13]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[14]  A. Gilmore,et al.  The language of civil engineering research articles: A corpus-based approach , 2018, English for Specific Purposes.

[15]  James Milton,et al.  Measuring the contribution of academic and general vocabulary knowledge to learners' academic achievement , 2018 .

[16]  Dongkwang Shin,et al.  A Collocation Inventory for Beginners , 2009 .

[17]  Andreas Eriksson Pedagogical perspectives on bundles: Teaching bundles to doctoral students of biochemistry , 2012 .

[18]  Heidi R. Wright Lexical bundles in stand-alone literature reviews: Sections, frequencies, and functions , 2019, English for Specific Purposes.

[19]  J. Leśniewska,et al.  Cross-linguistic influence and acceptability judgments of L2 and L1 collocations: A study of advanced Polish learners of English , 2007 .

[20]  Sidney Greenbaum,et al.  Clause relationships in spoken and written English , 1995 .

[21]  Yu-Hua Chen,et al.  Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach , 2013 .

[22]  Philip Durrant Investigating the viability of a collocation list for students of English for Academic Purposes. , 2009 .

[23]  Wenhua Hsu The most frequent opaque formulaic sequences in English-medium college textbooks , 2014 .

[24]  Stuart Webb,et al.  Learning Collocations: Do the Number of Collocates, Position of the Node Word, and Synonymy Affect Learning? , 2011 .

[25]  Dilin Liu,et al.  The most frequently-used multi-word constructions in academic written English: A multi-corpus study , 2012 .

[26]  Yongqi Gu Vocabulary Learning Strategies , 2019, The Encyclopedia of Applied Linguistics.

[27]  R. Light Measures of response agreement for qualitative data: Some generalizations and alternatives. , 1971 .

[28]  Yuah V. Chon,et al.  Collocations in L2 Writing and Rater's Perceived Writing Proficiency , 2009 .

[29]  M. Hoey Lexical Priming: A New Theory of Words and Language , 2005 .

[30]  Kevin A Hallgren,et al.  Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. , 2012, Tutorials in quantitative methods for psychology.

[31]  Sylviane Granger,et al.  The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study , 2014 .

[32]  Winnie Cheng,et al.  From n-gram to skipgram to concgram , 2006 .

[33]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[34]  L. Isurin Teachers' Language: L1 Attrition in Russian–English Bilinguals , 2007 .

[35]  G. Underwood,et al.  The eyes have it: An eye-movement study into the processing of formulaic sequences , 2004 .

[36]  Michael Hoey,et al.  Patterns of Lexis In Text , 1991 .

[37]  Susan Hunston,et al.  Corpora in Applied Linguistics , 2002 .

[38]  Christina Gitsaki The Development of ESL Collocational Knowledge , 1996 .

[39]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[40]  Dongkwang Shin The High Frequency Collocations of Spoken and Written English , 2007 .

[41]  Averil Coxhead A New Academic Word List , 2000 .

[42]  Łukasz Grabowski Keywords and lexical bundles within English pharmaceutical discourse: A corpus-driven description , 2015 .

[43]  Mark Davies,et al.  A New Academic Vocabulary List , 2014 .

[44]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[45]  David A. Wood,et al.  The effectiveness of focused instruction of formulaic sequences in augmenting L2 learners' academic writing skills: A quantitative research study , 2015 .

[46]  Alison Wray,et al.  The functions of formulaic language: an integrated model , 2000 .

[47]  Averil Coxhead,et al.  On the other hand: Lexical bundles in academic writing and in the teaching of EAP , 2010 .

[48]  Nan Jiang,et al.  The Processing of Formulaic Sequences by Second Language Speakers. , 2007 .

[49]  N. Schmitt,et al.  To what extent do native and non-native writers make use of collocations? , 2009 .

[50]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[51]  Nadja Nesselhauf,et al.  Collocations in a Learner Corpus , 2005 .

[52]  Alison Wray,et al.  Formulaic Language and the Lexicon: List of Figures and Tables , 2002 .

[53]  A. Boulton,et al.  Pedagogical perspectives on bundles: Teaching bundles to doctoral students of biochemistry , 2012 .

[54]  Л О Катюха,et al.  Vocabulary-learning strategies , 2013 .

[55]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[56]  Moisés Almela,et al.  Words as "Lexical Units" in Learning/Teaching Vocabulary. , 2007 .

[57]  Nick C. Ellis,et al.  Memory for language , 2001 .

[58]  Michael Stubbs,et al.  COLLOCATIONS AND SEMANTIC PROFILES: ON THE CAUSE OF THE TROUBLE WITH QUANTITATIVE STUDIES , 1995 .