Supporting Collocation Learning

Collocations are of great importance for second language learners. Knowledge of them plays a key role in producing language accurately and fluently. But such knowledge is difficult to acquire, simply because there is so much of it. Collocation resources for learners are limited. Printed dictionaries are restricted in size, and only provide rudimentary search and retrieval options. Free online resources are rare, and learners find the language data they offer hard to interpret. Online collocation exercises are inadequate and scattered, making it difficult to acquire collocations in a systematic way. This thesis makes two claims: (1) corpus data can be presented in different ways to facilitate effective collocation learning, and (2) a computer system can be constructed to help learners systematically strengthen and enhance their collocation knowledge. To investigate the first claim, an enormous Web-derived corpus was processed, filtered, and organized into three searchable digital library collections that support different aspects of collocation learning. Each of these constitutes a vast concordance whose entries are presented in ways that help students use collocations more effectively in their writing. To provide extended context, concordance data is linked to illustrative sample sentences, both on the live Web and in the British National Corpus. Two evaluations were conducted, both of which suggest that these collections can and do help improve student writing. For the second claim, a system was built that automatically identifies collocations in texts that teachers or students provide, using natural language processing techniques. Students study, collect and store collocations of interest while reading. Teachers construct collocation exercises to consolidate what students have learned and amplify their knowledge. The system was evaluated with teachers and students in classroom settings, and positive outcomes were demonstrated. We believe that the deployment of computer-based collocation learning systems is an exciting development that will transform language learning.

[1]  P. Nation,et al.  A vocabulary-size test of controlled productive ability , 1999 .

[2]  R. Goulden,et al.  How large can a receptive vocabulary be? , 1990 .

[3]  Ian H. Witten,et al.  How to Build a Digital Library, Second Edition , 2009 .

[4]  Slava M. Katz,et al.  Principled Disambiguation: Discriminating Adjective Senses with Modified Nouns , 1995, CL.

[5]  Eric Wehrli,et al.  Using the Web as a Corpus for the Syntactic-Based Collocation Identification , 2004, LREC.

[6]  George M. Chinnery ON THE NET You've Got some GALL: Google-Assisted Language Learning , 2008 .

[7]  William E. Nagy,et al.  CENTER FOR THE STUDY OF READING Technical Report No . 627 ON THE ROLE OF CONTEXT IN FIRST-AND SECOND-LANGUAGE VOCABULARY LEARNING , 2011 .

[8]  C. Yorio Conventionalized Language Forms and the Development of Communicative Competence. , 1980 .

[9]  Sabine Braun,et al.  From pedagogically relevant corpora to authentic language learning contents , 2005, ReCALL.

[10]  Alejandro Curado Fuentes,et al.  The use of corpora and IT in a comparative evaluation approach to oral business English , 2003, ReCALL.

[11]  Chris Shei Discovering the hidden treasure on the Internet: using Google to uncover the veil of phraseology , 2008 .

[12]  Andrée Vansteelandt The BBI cominatory dictionary of English. A guide to word combinations , 1995 .

[13]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[14]  Hugh Bishop The effect of typographic salience on the look up and comprehension of unknown formulaic sequences , 2004 .

[15]  James R. Nattinger,et al.  Lexical Phrases and Language Teaching , 1992 .

[16]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[17]  D. F. Brown Advanced Vocabulary Teaching: the Problem of Collocation , 1974 .

[18]  Douglas Biber,et al.  Stance in spoken and written university registers , 2006 .

[19]  何高大,et al.  人工智能在外语教学中的应用——谦评《Artificial Intelligence in Second Language Learning: Raising Error Awareness》 , 2008 .

[20]  A. Cowie The Treatment of Collocations and Idioms in Learners' Dictionaries , 1981 .

[21]  Paul Nation,et al.  Where Would General Service Vocabulary Stop and Special Purposes Vocabulary Begin , 1995 .

[22]  Gertrude Moskowitz Caring and Sharing in the Foreign Language Class: A Sourcebook on Humanistic Techniques , 1978 .

[23]  Udo Hahn,et al.  You Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction , 2006, ACL.

[24]  Antoinette Renouf,et al.  WebCorp: an integrated system for web text search , 2007 .

[25]  Douglas Biber,et al.  Towards a taxonomy of web registers and text types: a multi-dimensional analysis , 2007 .

[26]  D. Biskup,et al.  L1 Influence on Learners’ Renderings of English Collocations: A Polish/German Empirical Study , 1992 .

[27]  Angela Chambers,et al.  INTEGRATING CORPUS CONSULTATION IN LANGUAGE STUDIES , 2005 .

[28]  Barbara Mehlmauer-Larcher,et al.  Computer corpora and the language classroom: on the potential and limitations of computer corpora in language teaching , 2005, ReCALL.

[29]  William H. Fletcher Concordancing the web: promise and problems, tools and techniques , 2007 .

[30]  Michael A. West,et al.  A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology , 1953 .

[31]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[33]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues , 1989, ACL.

[34]  I. S. P. Nation,et al.  Learning Vocabulary in Another Language: Appendixes , 2001 .

[35]  Robert Debski Analysis of research in CALL (1980–2000) with a reflection on CALL as an academic discipline , 2003, ReCALL.

[36]  Michael Lewis,et al.  Implementing the Lexical Approach: Putting Theory into Practice , 1997 .

[37]  S. Evert,et al.  Can we do better than frequency ? A case study on extracting PP-verb collocations , 2001 .

[38]  Angela Chambers,et al.  Integrating a corpus of classroom discourse in language teacher education: the case of discourse markers , 2006, ReCALL.

[39]  Costas Gabrielatos,et al.  Corpora and language teaching: Just a fling, or wedding bells? , 2005 .

[40]  Marcia J. Bates,et al.  The design of browsing and berrypicking techniques for the online search interface , 1989 .

[41]  Hyunsook Yoon,et al.  More than a linguistic reference: The influence of corpus technology on L2 academic writing , 2008 .

[42]  M. Stubbs,et al.  Using recurrent phrases as text-type discriminators: A quantitative method and some findings , 2003 .

[43]  Hyunsook Yoon,et al.  ESL student attitudes toward corpus use in L2 writing , 2004 .

[44]  M. Farghal,et al.  COLLOCATIONS: A NEGLECTED VARIABLE IN EFL , 1995 .

[45]  Jens Bahns,et al.  Should we teach EFL students collocations , 1993 .

[46]  Alison Wray Formulaic Language and the Lexicon: Formulaic Language and the Lexicon , 2002 .

[47]  J. Channell Applying Semantic Theory to Vocabulary Teaching , 1981 .

[48]  Angela Chambers,et al.  Corpus consultation and advanced learners’ writing skills in French , 2004 .

[49]  Nick C. Ellis,et al.  Memory for language , 2001 .

[50]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[51]  Nadja Nesselhauf THE USE OF COLLOCATIONS BY ADVANCED LEARNERS OF ENGLISH AND SOME IMPLICATIONS FOR TEACHING , 2003 .

[52]  J. Hulstijn,et al.  SOME EMPIRICAL EVIDENCE FOR THE INVOLVEMENT LOAD HYPOTHESIS IN VOCABULARY ACQUISITION , 2001 .

[53]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[54]  Jan Svartvik,et al.  A Communicative Grammar of English , 1975 .

[55]  C. Meyer English Corpus Linguistics An Introduction , 2002 .

[56]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[57]  Michael Lewis,et al.  The Lexical Approach , 1999 .

[58]  Ronald Carter,et al.  Trust the Text: Language, Corpus and Discourse , 2004 .

[59]  Maria Teresa Prat Zagrebelsky The Longman Grammar of Spoken and Written English: Lexico-grammatical patterns, multi-word lexical units, idiomatic phrases, collocations, inserts, binomials, lexical bundles... and other 'strange' things... , 2001 .

[60]  Mary McGee Wood A definition of idiom , 1986 .

[61]  L. Anthony,et al.  Developing a Freeware, Multiplatform Corpus Analysis Toolkit for the Technical Writing Classroom , 2006, IEEE Transactions on Professional Communication.