Improving Collocation Correction by Ranking Suggestions Using Linguistic Knowledge

The importance of collocations in the context of second language learning is generally acknowledged. Studies show that the “collocation density" in learner corpora is nearly the same as in native corpora, i.e., that use of collocations by learners is as common as it is by native speakers, while the collocation error rate in learner corpora is about ten times as high as in native reference corpora. Therefore, CALL could be of great aid to support the learners for better mastering of collocations. However, surprisingly few works address specifically research on CALL-oriented collocation learning assistants that detect miscollocations in the writings of the learners and propose suggestions for their correction or that offer the learner the possibility to verify a word co-occurrence with respect to its correctness as collocation and obtain suggestions for its correction in case it is determined to be a miscollocation. This disregard is likely to be, on the one hand, due to the focus of the CALL research so far on grammatical matters, and, on the other hand, due to the complexity of the problem. In order to be able to provide an adequate correction of a miscollocation, the collocation learning assistant must “guess" the meaning that the learner intended to express. This makes it very different from grammar or spell checkers, which can draw on grammatical respectively orthographic regularities of a language. In this paper, we focus on the problem of the provision of a ranked list of correction suggestions in a context in which the learner submits a collocation for verification and obtains a list of correction suggestions in the case of a miscollocation. We show that the retrieval of the suggestions and their ranking benefits greatly from NLP techniques that provide the syntactic dependency structure and subcategorization information of the word co-occurrences and a weighted Pointwise Mutual Information (PMI) that reflects the fact that in a collocation, it is the base that is subject of the free choice of the speaker, while the occurrence of the collocate is restricted by the base, i.e., that collocations are per se asymmetric.

[1]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[2]  C. I. Lewis The Modes of Meaning , 1943 .

[3]  Rogelio Nazar,et al.  Towards advanced collocation error correction in Spanish learner corpora , 2014, Lang. Resour. Evaluation.

[4]  Nadja Nesselhauf How learner corpus analysis can contribute to language teaching: A study of support verb constructions , 2004 .

[5]  Margarita Alonso Ramos,et al.  Writing assistants and automatic lexical error correction: word combinatorics , 2013 .

[6]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[7]  Pavel Pecina AMachine Learning Approach to Multiword Expression Extraction , 2008 .

[8]  Nadja Nesselhauf,et al.  Collocations in a Learner Corpus , 2005 .

[9]  Anthony Paul Cowie,et al.  Phraseology : theory, analysis, and applications , 2000 .

[10]  David Wible,et al.  Automated Suggestions for Miscollocations , 2009, BEA@NAACL.

[11]  Sylviane Granger,et al.  Prefabricated patterns in advanced EFL writing: collocations and formulae , 1998 .

[12]  Margarita Alonso Ramos,et al.  A Comparative Study of Collocations in a Native Corpus and a Learner Corpus of Spanish , 2013 .

[13]  E. K. Blau Teaching Collocation—Further Developments in the Lexical Approach , 2002 .

[14]  Jason S. Chang,et al.  An automatic collocation writing assistant for Taiwanese EFL learners: A case of corpus-based NLP technology , 2008 .

[15]  Jason S. Chang,et al.  Automatic Collocation Suggestion in Academic Writing , 2010, ACL.

[16]  M. Halliday Categories of the theory of grammar , 1959 .

[17]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[18]  Martin Chodorow,et al.  A computational approach to detecting collocation errors in the writing of non-native speakers of English , 2008 .

[19]  Hwee Tou Ng,et al.  Correcting Semantic Collocation Errors with L1-induced Paraphrases , 2011, EMNLP.

[20]  R. Schreuder,et al.  Phrasemes in Language and Phraseology in Linguistics , 2014 .

[21]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[22]  Orsolya Vincze,et al.  Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora , 2010, LREC.

[23]  Christian Chiarcos,et al.  Von der Form zur Bedeutung: Texte automatisch verarbeiten/From Form to Meaning: Processing Texts Automatically , 2009 .

[24]  Cristóbal Lozano CEDEL2: Corpus Escrito del Español L2 , 2009 .

[25]  Gerlof Bouma Collocation Extraction beyond the Independence Assumption , 2010, ACL.

[26]  M. Benson The Structure of the Collocational Dictionary , 1989 .

[27]  Justyna Leśniewska Collocations and Second Language Use / Justyna Leśniewska. , 2006 .