Discovering the hidden treasure on the Internet: using Google to uncover the veil of phraseology

Formulaic speech has been notoriously difficult to define and identify despite its crucial importance to native-like fluency and idiomaticity. In this article, I introduce a way of identifying phraseological units in a running text. I am interested in recurrent fragments like charged with crimes against humanity in texts which involve multiple word collocations in a ‘fuzzily fixed’ lexico-syntactic frame. I suspect these kind of phraseological fragments not only add to the fluency and idiomaticity of texts, save text production time, but actually constitute milestones in the generation of text and sentences. No currently well known corpus seems large enough to provide adequate instances of prefabricated chunks like this for closer investigation. It is proposed here that Internet as a gigantic corpus and a search engine like Google can help identify and retrieve these phraseological units for linguistic research and language teaching and learning.

[1]  Kathryn Bock,et al.  Language production : Grammatical encoding , 1994 .

[2]  Randi Reppen,et al.  From Corpus to classroom: Language use and language teaching , 2008 .

[3]  Susan Hockey,et al.  Living with Google: Perspectives on Humanities Computing and Digital Libraries: Busa Award Lecture, June 2004 , 2005, Lit. Linguistic Comput..

[4]  Anne Wichmann,et al.  Teaching and Language Corpora , 1997 .

[5]  D. Noël,et al.  Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English (review) , 2002 .

[6]  Volker Hegelheimer,et al.  ESL students' use of concordance in the transfer of academic word knowledge: An exploratory study , 2005 .

[7]  D. Noël Pattern grammar: A corpus-driven approach to the lexical grammar of English. By SUSAN HUNSTON and GILL FRANCIS (Studies in corpus linguistics 4.) Amsterdam & Philadelphia: John Benjamins, 2000 , 2002 .

[8]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[9]  Alison Wray,et al.  Formulaic Language and the Lexicon: List of Figures and Tables , 2002 .

[10]  R. Sussex Review of Wichmann, Fligelstone, McEnery & Knowles (1997): Teaching and language corpora , 2002 .

[11]  Rune Sætre,et al.  Semantic Annotation of Biomedical Literature Using Google , 2005, ICCSA.

[12]  Shesen Guo,et al.  Building a customised Google-based collocation collector to enhance language learning , 2007, Br. J. Educ. Technol..

[13]  Susan Hunston,et al.  Book Reviews: Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English , 2000, CL.

[14]  Oliver Mason,et al.  Words And Phrases , 2002, Lit. Linguistic Comput..

[15]  Alison Wray,et al.  Formulaic Language and the Lexicon: List of Figures and Tables , 2002 .

[16]  James R. Nattinger,et al.  Lexical Phrases and Language Teaching , 1992 .

[17]  Andrew Pawley,et al.  Developments in the study of formulaic language since 1970: A personal view , 2007 .