Informing Determiner and Preposition Error Correction with Hierarchical Word Clustering

We extend our n-gram-based data-driven prediction approach from the Helping Our Own (HOO) 2011 Shared Task (Boyd and Meurers, 2011) to identify determiner and preposition errors in non-native English essays from the Cambridge Learner Corpus FCE Dataset (Yannakoudakis et al., 2011) as part of the HOO 2012 Shared Task. Our system focuses on three error categories: missing determiner, incorrect determiner, and incorrect preposition. Approximately two-thirds of the errors annotated in HOO 2012 training and test data fall into these three categories. To improve our approach, we developed a missing determiner detector and incorporated word clustering (Brown et al., 1992) into the n-gram prediction approach.

[1]  Timothy Baldwin,et al.  Learning the Countability of English Nouns from Corpus Data , 2003, ACL.

[2]  Randy Goebel,et al.  Web-Scale N-gram Models for Lexical Disambiguation , 2009, IJCAI.

[3]  Walt Detmar Meurers,et al.  Data-Driven Correction of FunctionWords in Non-Native English , 2011, ENLG.

[4]  van Gerardus Noord,et al.  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) , 2010 .

[5]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[6]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[7]  Dan Roth,et al.  Algorithm Selection and Model Adaptation for ESL Correction Tasks , 2011, ACL.

[8]  Walt Detmar Meurers,et al.  Exploring the Data-Driven Prediction of Prepositions in English , 2010, COLING.

[9]  Raúl Aranovich The Proceedings of the Thirteenth West Coast Conference on Formal Linguistics , 1995 .

[10]  Rachele De Felice,et al.  Automatic error detection in non-native English , 2008 .

[11]  Martin Chodorow,et al.  Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection , 2008, COLING 2008.

[12]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[13]  Na-Rae Han,et al.  Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus , 2004, LREC.

[14]  Stephen Wechsler,et al.  Preposition Selection Outside the Lexicon , 1995 .

[15]  Dan Roth,et al.  Modeling Discriminative Global Inference , 2007, International Conference on Semantic Computing (ICSC 2007).

[16]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[17]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[18]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[19]  Adam Kilgarriff,et al.  Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[20]  Francis Bond,et al.  Memory-Based Learning for Article Generation , 2000, CoNLL/LLL.