论文信息 - Informing Determiner and Preposition Error Correction with Hierarchical Word Clustering - 字舞流文

Informing Determiner and Preposition Error Correction with Hierarchical Word Clustering

We extend our n-gram-based data-driven prediction approach from the Helping Our Own (HOO) 2011 Shared Task (Boyd and Meurers, 2011) to identify determiner and preposition errors in non-native English essays from the Cambridge Learner Corpus FCE Dataset (Yannakoudakis et al., 2011) as part of the HOO 2012 Shared Task. Our system focuses on three error categories: missing determiner, incorrect determiner, and incorrect preposition. Approximately two-thirds of the errors annotated in HOO 2012 training and test data fall into these three categories. To improve our approach, we developed a missing determiner detector and incorporated word clustering (Brown et al., 1992) into the n-gram prediction approach.

Walt Detmar Meurers | Adriane Boyd | Marion Zepf | Adriane Boyd | M. Zepf

[1] Timothy Baldwin,et al. Learning the Countability of English Nouns from Corpus Data , 2003, ACL.

[2] Randy Goebel,et al. Web-Scale N-gram Models for Lexical Disambiguation , 2009, IJCAI.

[3] Walt Detmar Meurers,et al. Data-Driven Correction of FunctionWords in Non-Native English , 2011, ENLG.

[4] van Gerardus Noord,et al. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) , 2010 .

[5] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[6] Helen Yannakoudakis,et al. A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[7] Dan Roth,et al. Algorithm Selection and Model Adaptation for ESL Correction Tasks , 2011, ACL.

[8] Walt Detmar Meurers,et al. Exploring the Data-Driven Prediction of Prepositions in English , 2010, COLING.

[9] Raúl Aranovich. The Proceedings of the Thirteenth West Coast Conference on Formal Linguistics , 1995 .

[10] Rachele De Felice,et al. Automatic error detection in non-native English , 2008 .

[11] Martin Chodorow,et al. Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection , 2008, COLING 2008.

[12] Jianfeng Gao,et al. Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[13] Na-Rae Han,et al. Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus , 2004, LREC.

[14] Stephen Wechsler,et al. Preposition Selection Outside the Lexicon , 1995 .

[15] Dan Roth,et al. Modeling Discriminative Global Inference , 2007, International Conference on Semantic Computing (ICSC 2007).

[16] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[17] Treebank Penn,et al. Linguistic Data Consortium , 1999 .

[18] Percy Liang,et al. Semi-Supervised Learning for Natural Language , 2005 .

[19] Adam Kilgarriff,et al. Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[20] Francis Bond,et al. Memory-Based Learning for Article Generation , 2000, CoNLL/LLL.