Mining Sequential Patterns and Tree Patterns to Detect Erroneous Sentences

An important application area of detecting erroneous sentences is to provide feedback for writers of English as a Second Language. This problem is difficult since both erroneous and correct sentences are diversified. In this paper, we propose a novel approach to identifying erroneous sentences. We first mine labeled tree patterns and sequential patterns to characterize both erroneous and correct sentences. Then the discovered patterns are utilized in two ways to distinguish correct sentences from erroneous sentences: (1) the patterns are transformed into sentence features for existing classification models, e.g, SVM; (2) the patterns are used to build a rule-based classification model. Experimental results show that both techniques are promising while the second technique outperforms the first approach. Moreover, the classification model in the second proposal is easy to understand, and we can provide intuitive explanation for classification results.

[1]  Michael Gamon,et al.  Sentence-level MT evaluation without reference translations: beyond language modeling , 2005, EAMT.

[2]  Lina Zhou,et al.  Error Detection Using Linguistic Features , 2005, HLT/EMNLP.

[3]  Naoki Isu,et al.  A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English , 2006, ACL.

[4]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[5]  Lisa N. Michaud,et al.  An intelligent tutoring system for deaf learners of written English , 2000, Assets '00.

[6]  Peter W. Foltz,et al.  Automated Essay Scoring: Applications to Educational Technology , 1999 .

[7]  Michael Gamon,et al.  A Machine Learning Approach to the Automatic Evaluation of Machine Translation , 2001, ACL.

[8]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[9]  Michael Gamon,et al.  Correcting ESL Errors Using Phrasal SMT Techniques , 2006, ACL.

[10]  Ming Zhou,et al.  Detecting Erroneous Sentences using Automatically Mined Sequential Patterns , 2007, ACL.

[11]  Martin Chodorow,et al.  An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[12]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[13]  Martin Chodorow,et al.  Automated Scoring Using A Hybrid Feature Identification Technique , 1998, ACL.

[14]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[15]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[16]  Hitoshi Isahara,et al.  Automatic Error Detection in the Japanese Learners’ English Spoken Data , 2003, ACL.

[17]  Timothy Baldwin,et al.  Arboretum: Using a precision grammar for grammar checking in CALL , 2004 .