The Corpus of Contemporary American English as the first reliable monitor corpus of English

The Corpus of Contemporary American English is the first large, genre-balanced corpus of any language, which has been designed and constructed from the ground up as a 'monitor corpus', and which can be used to accurately track and study recent changes in the language. The 400 million words corpus is evenly divided between spoken, fiction, popular magazines, newspapers, and academic journals. Most importantly, the genre balance stays almost exactly the same from year to year, which allows it to accurately model changes in the 'real world'. After discussing the corpus design, we provide a number of concrete examples of how the corpus can be used to look at recent changes in English, including morph- ology (new suffixes -friendly and -gate), syntax (including prescriptive rules, quotative like, so not ADJ, the get passive, resultatives, and verb complementa- tion), semantics (such as changes in meaning with web, green, or gay), and lexis-- including word and phrase frequency by year, and using the corpus architecture to produce lists of all words that have had large shifts in frequency between specific historical periods.

[1]  M. Stubbs Text and corpus analysis , 1996 .

[2]  C. Meyer English Corpus Linguistics An Introduction , 2002 .

[3]  Anthony McEnery,et al.  HELP or HELP to: What Do Corpora Have to Say? , 2005 .

[4]  Andrew Wilson,et al.  Corpus linguistics : an introduction. , 2001 .

[5]  Tony McEnery,et al.  Corpus-Based Language Studies: An Advanced Resource Book , 2006 .

[6]  Facchinetti Roberta Be able to in Present-day British English , 2000, Corpus Linguistics and Linguistic Theory.

[7]  Amir Zeldes Tony McEnery, Richard Xiao & Yukio Tono. 2006. Corpus-Based Language Studies. An Advanced Resource Book (Routledge Applied Linguistics). London, New York: Routledge. xx, 386 S , 2010 .

[8]  Mark Davies,et al.  Pointing Out Frequent Phrasal Verbs: A Corpus‐Based Analysis , 2007 .

[9]  Isabelle Buchstaller,et al.  Localized globalization: A multi-local, multivariate investigation of quotative be like , 2009 .

[10]  Sali A. Tagliamonte,et al.  He’s like, she’s like: The quotative system in Canadian youth , 2004 .

[11]  Geoffrey Sampson,et al.  Corpus Linguistics: Readings in a Widening Discipline , 2004 .

[12]  Lou Burnard,et al.  Where did we Go Wrong? A Retrospective Look at the British National Corpus , 2002 .

[13]  Mark Davies,et al.  The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation , 2005 .

[14]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[15]  Marianne Hundt,et al.  What corpora tell us about the grammaticalisation of voice in get-constructions , 2001 .

[16]  R. A. Close,et al.  Notes on the Split Infinitive , 1987 .

[17]  G. Leech Modality on the move: the English modal auxiliaries 1961-1992. , 2003 .

[18]  Susan Hunston,et al.  Corpora in Applied Linguistics , 2002 .

[19]  Tony McEnery,et al.  A Glossary of Corpus Linguistics , 2006 .

[20]  Federica Barbieri,et al.  Quotative be like in American English: Ephemeral or here to stay? , 2009 .

[21]  Christian Mair,et al.  Tracking ongoing grammatical change and recent diversification in present-day standard English: the complementary role of small and large corpora , 2006 .

[22]  Göran Kjellmer,et al.  Help to/help ⊘ revisited , 1985 .

[23]  N. Millar,et al.  Modal verbs in TIME: Frequency changes 1923-2006 , 2009 .

[24]  Mark Davies The 385+ million word Corpus of Contemporary American English (1990―2008+): Design, architecture, and linguistic insights , 2009 .