Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization

We address the problem of class imbalance in supervised grammatical error detection (GED) for non-native speaker text, which is the result of the low proportion of erroneous examples compared to a large number of error-free examples. Most learning algorithms maximize accuracy which is not a suitable objective for such imbalanced data. For GED, most systems address this issue by tuning hyperparameters to maximize metrics like Fβ . Instead, we show that learning classifiers that directly learn model parameters by optimizing evaluation metrics like F1 and F2 score deliver better performance on these metrics as compared to traditional sampling and cost-sensitive learning solutions for addressing class imbalance. Optimizing these metrics is useful in recall-oriented grammar error detection scenarios. We also show that there are inherent difficulties in optimizing precision-oriented evaluation metrics like F0.5. We establish this through a systematic evaluation on multiple datasets and different GED tasks.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[3]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[4]  Hwee Tou Ng,et al.  A Beam-Search Decoder for Grammatical Error Correction , 2012, EMNLP.

[5]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[6]  Pushpak Bhattacharyya,et al.  IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction , 2013, CoNLL Shared Task.

[7]  Adam Kilgarriff,et al.  Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[8]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[9]  Dan Roth,et al.  Generating Confusion Sets for Context-Sensitive Error Correction , 2010, EMNLP.

[10]  Martin Chodorow,et al.  Problems in Evaluating Grammatical Error Detection Systems , 2012, COLING.

[11]  Hwee Tou Ng,et al.  NUS at the HOO 2012 Shared Task , 2012, BEA@NAACL-HLT.

[12]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[13]  Robert Dale,et al.  HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task , 2012, BEA@NAACL-HLT.

[14]  Xiaodong Zeng,et al.  UM-Checker: A Hybrid System for English Grammatical Error Correction , 2013, CoNLL Shared Task.

[15]  Kiyotaka Uchimoto,et al.  The NICT JLE Corpus Exploiting the language learners' speech database for research and education , 2004 .

[16]  Marcin Junczys-Dowmunt,et al.  The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation , 2014, CoNLL Shared Task.

[17]  Dan Roth,et al.  The UI System in the HOO 2012 Shared Task on Error Correction , 2012, BEA@NAACL-HLT.

[18]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[19]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[20]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[21]  Desmond Darma Putra,et al.  UdS at CoNLL 2013 Shared Task , 2013, CoNLL Shared Task.

[22]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[23]  Pushpak Bhattacharyya,et al.  Tuning a Grammar Correction System for Increased Precision , 2014, CoNLL Shared Task.