High-Order Sequence Modeling for Language Learner Error Detection

We address the problem of detecting English language learner errors by using a discriminative high-order sequence model. Unlike most work in error-detection, this method is agnostic as to specific error types, thus potentially allowing for higher recall across different error types. The approach integrates features from many sources into the error-detection model, ranging from language model-based features to linguistic analysis features. Evaluation results on a large annotated corpus of learner writing indicate the feasibility of our approach on a realistic, noisy and inherently skewed set of data. High-order models consistently outperform low-order models in our experiments. Error analysis on the output shows that the calculation of precision on the test set represents a lower bound on the real system performance.

[1]  Johnny Bigert Robust Error Detection: A Hybrid Approach Combining Unsupervised Error Detection and Linguistic Knowledge , 2002 .

[2]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[3]  Andrew McCallum,et al.  Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences , 2003 .

[4]  Roger Levy,et al.  Automated Whole Sentence Grammar Correction Using a Noisy Channel Model , 2011, ACL.

[5]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[8]  Dan Roth,et al.  Training Paradigms for Correcting Errors in Grammar and Usage , 2010, NAACL.

[9]  Dan Roth,et al.  Generating Confusion Sets for Context-Sensitive Error Correction , 2010, EMNLP.

[10]  Josef van Genabith,et al.  Judging Grammaticality: Experiments in Sentence Classification , 2013, CALICO Journal.

[11]  Jonas Sjöbergh Chunking: an unsupervised method to find errors in text , 2005, NODALIDA.

[12]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[13]  Emi Izumia,et al.  SST speech corpus of Japanese learners ’ English and automatic detection of learners ’ errors , 2004 .

[14]  Eric Atwell,et al.  How to Detect Grammatical Errors in a Text Without Parsing It , 1987, EACL.

[15]  Ming Zhou,et al.  Detecting Erroneous Sentences using Automatically Mined Sequential Patterns , 2007, ACL.

[16]  Martin Chodorow,et al.  An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[17]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[18]  Hitoshi Isahara,et al.  The Overview of the SST Speech Corpus of Japanese Learner English and Evaluation Through the Experiment on Automatic Detection of Learners' Errors , 2004, LREC.

[19]  Na-Rae Han,et al.  Using Error-Annotated ESL Data to Develop an ESL Error Correction System , 2010 .

[20]  William W. Cohen,et al.  NER Systems that Suit User’s Preferences: Adjusting the Recall-Precision Trade-off for Entity Extraction , 2006, NAACL.

[21]  Jianfeng Gao,et al.  MSRLM: a Scalable Language Modeling Toolkit , 2007 .

[22]  Xiaolong Li,et al.  An Overview of Microsoft Web N-gram Corpus and Applications , 2010, NAACL.

[23]  Claudia Leacock,et al.  Automated Grammatical Error Correction for Language Learners , 2010, COLING.

[24]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[25]  Na-Rae Han,et al.  Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System , 2010, LREC.

[26]  Rachele De Felice,et al.  A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English , 2008, COLING.

[27]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[28]  Jianfeng Gao,et al.  The Use of Clustering Techniques for Language Modeling V Application to Asian Language , 2001, ROCLING/IJCLCLP.

[29]  Jun'ichi Tsujii,et al.  A discriminative language model with pseudo-negative samples , 2007, ACL.

[30]  Michael Gamon,et al.  Using Mostly Native Data to Correct Errors in Learners’ Writing , 2010, NAACL.