Automated Suggestions for Miscollocations

One of the most common and persistent error types in second language writing is collocation errors, such as learn knowledge instead of gain or acquire knowledge, or make damage rather than cause damage. In this work-in-progress report, we propose a probabilistic model for suggesting corrections to lexical collocation errors. The probabilistic model incorporates three features: word association strength (MI), semantic similarity (via Word-Net) and the notion of shared collocations (or intercollocability). The results suggest that the combination of all three features outperforms any single feature or any combination of two features.

[1]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[2]  Christina Gitsaki,et al.  English collocations and their place in the EFL classroom , 2000 .

[3]  Lawrence M. Rudner Automated Essay Scoring: A Cross-Disciplinary Perspective edited by Mark D. ShermisJill C. Burstein , 2004, Comput. Linguistics.

[4]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[5]  Kenji Kita,et al.  COLLOCATIONS IN LANGUAGE LEARNING: CORPUS‐BASED AUTOMATIC COMPILATION OF COLLOCATIONS AND BILINGUAL COLLOCATION CONCORDANCER , 1997 .

[6]  Chin-Hwa Kuo,et al.  Feature expansion for word sense disambiguation , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[7]  Peter Howarth,et al.  Phraseology and Second Language Proficiency , 1998 .

[8]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[9]  Chin-Hwa Kuo,et al.  Bootstrapping in a language learning environment , 2003, J. Comput. Assist. Learn..

[10]  Martin Chodorow,et al.  A computational approach to detecting collocation errors in the writing of non-native speakers of English , 2008 .

[11]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[12]  Na-Rae Han,et al.  Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus , 2004, LREC.