Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text

While the field of grammatical error detection has progressed over the past few years, one area of particular difficulty for both native and non-native learners of English, comma placement, has been largely ignored. We present a system for comma error correction in English that achieves an average of 89% precision and 25% recall on two corpora of unedited student essays. This system also achieves state-of-the-art performance in the sister task of restoring commas in well-formed text. For both tasks, we show that the use of novel features which encode long-distance information improves upon the more lexically-driven features used in prior work.

[1]  Dan Roth,et al.  University of Illinois System in HOO Text Correction Shared Task , 2011, ENLG.

[2]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[3]  Hwee Tou Ng,et al.  Grammatical Error Correction with Alternating Structure Optimization , 2011, ACL.

[4]  Michael White,et al.  A More Precise Analysis of Punctuation for Broad-Coverage Surface Realization with CCG , 2008, COLING 2008.

[5]  Michael Gamon,et al.  Using Mostly Native Data to Correct Errors in Learners’ Writing , 2010, NAACL.

[6]  Bernard E. M. Jones Exploring The Role Of Punctuation In Parsing Natural Text , 1994, COLING.

[7]  Daniel Hardt Comma checking in Danish , 2001 .

[8]  Markus Dickinson,et al.  Developing Methodology for Korean Particle Error Detection , 2011, BEA@ACL.

[9]  Jennifer Foster,et al.  Using Parse Features for Preposition Selection and Error Detection , 2010, ACL.

[10]  Adam Kilgarriff,et al.  Helping Our Own: Text Massaging for Computational Linguistics as a New Shared Task , 2010, INLG.

[11]  Rachele De Felice,et al.  A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English , 2008, COLING.

[12]  Helena Moniz,et al.  Prosodically-based automatic segmentation and punctuation , 2010, Speech Prosody 2010.

[13]  Michael Gamon High-Order Sequence Modeling for Language Learner Error Detection , 2011, BEA@ACL.

[14]  Stuart M. Shieber,et al.  Comma Restoration Using Constituency Information , 2003, HLT-NAACL.

[15]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[16]  Claudia Leacock,et al.  Automated Grammatical Error Correction for Language Learners , 2010, COLING.

[17]  Geoffrey Zweig,et al.  Maximum entropy model for punctuation annotation from speech , 2002, INTERSPEECH.

[18]  Timothy Baldwin,et al.  Restoring Punctuation and Casing in English Text , 2009, Australasian Conference on Artificial Intelligence.

[19]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[20]  Ted Briscoe,et al.  Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels , 1995, IWPT.

[21]  Michiel Bacchiani,et al.  Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Robin L. Hill,et al.  Commas and Spaces: The Point of Punctuation , 1998 .

[23]  Christine D. Doran,et al.  Incorporating Punctuation Into the Sentence Grammar: A Lexicalized Tree Adjoining Grammar Perspective , 1998 .

[24]  Andrea A. Lunsford,et al.  Frequency of Formal Errors in Current College Writing , 1988 .

[25]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[26]  Iñaki Alegria,et al.  Using Machine Learning Techniques to Build a Comma Checker for Basque , 2006, ACL.

[27]  Hiromi Oyama,et al.  Automatic Error Detection Method for Japanese Particles , 2010 .

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Eric Brill,et al.  A corpus-based approach to language learning , 1993 .

[30]  Hwee Tou Ng,et al.  Better Punctuation Prediction with Dynamic Conditional Random Fields , 2010, EMNLP.

[31]  Dilek Z. Hakkani-Tür,et al.  Syntactically-informed models for comma prediction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Nancy A. Blumenstock The Chicago Manual of Style . By the University of Chicago Press. 13th ed. Chicago: University of Chicago Press, 1982. ix, 740 pp. Glossary of Technical Terms, Bibliography, Index. $25. , 1984, The Journal of Asian Studies.

[33]  Andreas Stolcke,et al.  Comparing HMM, maximum entropy, and conditional random fields for disfluency detection , 2005, INTERSPEECH.