A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors

This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically wellformed or ill-formed. The deep processing approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE’s output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics: we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence. We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep decision tree features is effective. Our evaluation is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors into well-formed BNC sentences.

[1]  Timothy Baldwin,et al.  Arboretum: Using a precision grammar for grammar checking in CALL , 2004 .

[2]  Graeme Hirst,et al.  Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model , 2008, CICLing.

[3]  Jennifer Foster Good reasons for noting bad grammar : empirical investigations into the parsing of ungrammatical written English , 2005 .

[4]  J. Stemberger,et al.  Optimality Theory , 2003 .

[5]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[6]  Joseph Paul Stemberger,et al.  Syntactic errors in speech , 1982 .

[7]  Hitoshi Isahara,et al.  The Overview of the SST Speech Corpus of Japanese Learner English and Evaluation Through the Experiment on Automatic Detection of Learners' Errors , 2004, LREC.

[8]  Johnny Bigert Robust Error Detection: A Hybrid Approach Combining Unsupervised Error Detection and Linguistic Knowledge , 2002 .

[9]  Emily M. Bender,et al.  Beauty and the Beast: What Running a Broad-coverage precision grammar over the BNC taught us about the grammar and the corpus , 2005 .

[10]  Carl James,et al.  Errors in Language Learning and Use: Exploring Error Analysis , 1998 .

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  Andrew U. Frank,et al.  Optimality theory style constraint ranking in large-scale LFG grammars , 1998 .

[13]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[14]  Timothy Baldwin,et al.  Multilingual Deep Lexical Acquisition for HPSGs via Supertagging , 2006, EMNLP.

[15]  Berthold Crysmann,et al.  ANNOTATION OF ERROR TYPES FOR GERMAN NEWS CORPUS , .

[16]  Andrew R. Golding,et al.  A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[17]  Miriam Butt,et al.  The Parallel Grammar Project , 2002, COLING 2002.

[18]  Veit Reuer PromisD: ein Analyseverfahren zur antizipationsfreien Erkennung und Erklärung von grammatischen Fehlern in Sprachlernsystemen , 2003 .

[19]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[20]  Carl Vogel,et al.  Good Reasons for Noting Bad Grammar : Constructing a Corpus of Ungrammatical Language , 2004 .

[21]  Berthold Crysmann,et al.  Annotation of Error Types for German Newsgroup Corpus , 2003 .

[22]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[24]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[25]  Anne Vandeventer Faltin Syntactic error diagnosis in the context of computer assisted language learning , 2003 .

[26]  Sylviane Granger,et al.  The International Corpus of Learner English , 1993 .

[27]  Eric Atwell,et al.  How to Detect Grammatical Errors in a Text Without Parsing It , 1987, EACL.

[28]  Martin Chodorow,et al.  An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[29]  Johnny Bigert Probabilistic Detection of Context-Sensitive Spelling Errors , 2004, LREC.

[30]  Erik Smitterberg,et al.  International Corpus of Learner English , 2004 .

[31]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[32]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .