Automatic error detection in non-native English

This thesis describes the development of Dapper (`Determiner And PrePosition Error Recogniser'), a system designed to automatically acquire models of occurrence for English prepositions and determiners to allow for the detection and correction of errors in their usage, especially in the writing of non-native speakers of the language. Prepositions and determiners are focused on because they are parts of speech whose usage is particularly challenging to acquire, both for students of the language and for natural language processing tools. The work presented in this thesis proposes to address this problem by developing a system which can acquire models of correct preposition and determiner occurrence, and can use this knowledge to identify divergences from these models as errors. The contexts of these parts of speech are represented by a sophisticated feature set, incorporating a variety of semantic and syntactic elements. DAPPER is found to perform well on preposition and determiner selection tasks in correct native English text. Results on each preposition and determiner are discussed in detail to understand the possible reasons for variations in performance, and whether these are due to problems with the structure of DAPPER or to deeper linguistic reasons. An in-depth analysis of all features used is also offered, quantifying the contribution of each feature individually. This can help establish if the decision to include complex semantic and syntactic features is justified in the context of this task. Finally, the performance of DAPPER on non-native English text is assessed. The system is found to be robust when applied to text which does not contain any preposition or determiner errors. On an error correction task, results are mixed: DAPPER shows promising results on preposition selection and determiner confusion (definite vs. indefinite) errors, but is less successful in detecting errors involving missing or extraneous determiners. Several characteristics of learner writing are described, to gain a clearer understanding of what problems arise when natural language processing tools are used with this kind of text. It is concluded that the construction of contextual models is a viable approach to the task of preposition and determiner selection, despite outstanding issues pertaining to the domain of non-native writing.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Na-Rae Han,et al.  Detecting errors in English article usage by non-native speakers , 2006, Natural Language Engineering.

[3]  Fumito Masui,et al.  A Statistical Model Based on the Three Head Words for Detecting Article Errors , 2005, IEICE Trans. Inf. Syst..

[4]  Jonas Sjöbergh Chunking: an unsupervised method to find errors in text , 2005, NODALIDA.

[5]  Eric Brill,et al.  Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[6]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[7]  Ebba Gustavii,et al.  Target language preposition selection – an experiment with transformation based learning and aligned bilingual data , 2005, EAMT.

[8]  Martin Chodorow,et al.  Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection , 2008, COLING 2008.

[9]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[10]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[11]  Maria Teresa Prat Computer Learner Corpora. Theoretical issues and empirical case studies of Italian advanced EFL learners' interlanguage. , 2004 .

[12]  Norma A. Pravec Survey of learner corpora , 2002 .

[13]  P.J.M. de Haan,et al.  The TOSCA-ICLE Tagset. Software manual , 1997 .

[14]  D. Ferris The ‘‘Grammar Correction’ ’ Debate in L2 Writing: , 2022 .

[15]  P.J.M. de Haan,et al.  Tagging non-native English with the TOSCA-ICLE tagger , 2000, Corpus Linguistics and Linguistic Theory.

[16]  Stephen G. Pulman,et al.  Automatically Acquiring Models of Preposition Use , 2007, ACL 2007.

[17]  Naoki Isu,et al.  A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English , 2006, ACL.

[18]  Jens Eeg-Olofsson,et al.  Automatic Grammar Checking for Second Language Learners – the Use of Prepositions , 2003 .

[19]  Susan Hunston,et al.  Book Reviews: Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English , 2000, CL.

[20]  Jesse Tseng,et al.  The representation and selection of prepositions , 2000 .

[21]  Eric Atwell,et al.  How to Detect Grammatical Errors in a Text Without Parsing It , 1987, EACL.

[22]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[23]  Eugene Charniak,et al.  Language Modeling for Determiner Selection , 2007, NAACL.

[24]  Ola Knutsson,et al.  The Role of PP Attachment in Preposition Generation , 2008, CICLing.

[25]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[26]  Carl James,et al.  Errors in Language Learning and Use: Exploring Error Analysis , 1998 .

[27]  Michael Gamon,et al.  Correcting ESL Errors Using Phrasal SMT Techniques , 2006, ACL.

[28]  Icy Lee,et al.  Error correction in L2 secondary writing classrooms: The case of Hong Kong , 2004 .

[29]  Francis Bond,et al.  Memory-Based Learning for Article Generation , 2000, CoNLL/LLL.

[30]  F. G. French,et al.  Common errors in English : their cause, prevention and cure , 1949 .

[31]  Martin Parrott Grammar for English Language Teachers , 2010 .

[32]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[33]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[34]  Sylviane Granger,et al.  The International Corpus of Learner English , 1993 .

[35]  Jean Chandler,et al.  THE EFFICACY OF VARIOUS KINDS OF ERROR FEEDBACK FOR IMPROVEMENT IN THE ACCURACY AND FLUENCY OF L2 STUDENT WRITING , 2003 .

[36]  Angela Chambers,et al.  INTEGRATING CORPUS CONSULTATION IN LANGUAGE STUDIES , 2005 .

[37]  Martin Chodorow,et al.  An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[38]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[39]  James Thomas,et al.  USING COMPUTERS IN CORRECTING WRITTEN WORK , 2004 .

[40]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[41]  Carol A. Chapelle TECHNOLOGY AND SECOND LANGUAGE ACQUISITION , 2007, Annual Review of Applied Linguistics.

[42]  Chin-Hwa Kuo,et al.  Bootstrapping in a language learning environment , 2003, J. Comput. Assist. Learn..

[43]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[44]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[45]  Emi Izumia,et al.  SST speech corpus of Japanese learners ’ English and automatic detection of learners ’ errors , 2004 .

[46]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[47]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[48]  R. Blake NEW TRENDS IN USING TECHNOLOGY IN THE LANGUAGE CURRICULUM , 2007, Annual Review of Applied Linguistics.

[49]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[50]  Walt Detmar Meurers Where does ICALL fit into Foreign Language Teaching , 2006 .

[51]  Frederick T. Wood,et al.  English Prepositional Idioms , 1967 .

[52]  Johnny Bigert Probabilistic Detection of Context-Sensitive Spelling Errors , 2004, LREC.

[53]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[54]  S. P. Corder,et al.  Error analysis and interlanguage , 1981 .

[55]  Carl Vogel,et al.  Parsing Ill-Formed Text Using an Error Grammar , 2004, Artificial Intelligence Review.

[56]  John Lee,et al.  Automatic Article Restoration , 2004, NAACL.

[57]  Sylviane Granger,et al.  Computer learner corpus research: current status and future prospects , 2004 .

[58]  John Bitchener,et al.  The Effect of Different Types of Corrective Feedback on ESL Student Writing. , 2005 .

[59]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[60]  David Heath,et al.  A valency dictionary of English: a corpus-based analysis of the complementation patterns of English verbs, nouns and adjectives , 2004 .

[61]  Virginie Zampa,et al.  Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness1 , 2007, ReCALL.

[62]  何高大,et al.  人工智能在外语教学中的应用——谦评《Artificial Intelligence in Second Language Learning: Raising Error Awareness》 , 2008 .

[63]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[64]  Seth Lindstromberg,et al.  English Prepositions Explained , 1998 .

[65]  Bertus van Rooy,et al.  An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus , 2003 .

[66]  Naoki Isu,et al.  Recognizing article errors using prepositional information , 2006 .

[67]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[68]  Sylvana Sofkova Hashemi,et al.  Positive Grammar Checking: A Finite State Approach , 2003, CICLing.

[69]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[70]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[71]  M. E. Keizer,et al.  Postnominal PP complements and modifiers: a cognitive distinction , 2004, English Language and Linguistics.

[72]  Anthony Paul Cowie,et al.  Phraseology : theory, analysis, and applications , 2000 .

[73]  R. Quirk,et al.  A Student's Grammar of the English Language , 1990 .

[74]  Anna Mauranen Speech corpora in the classroom , 2004 .

[75]  Sylviane Granger,et al.  The computer learner corpus: a versatile new source of data for SLA research , 1998 .

[76]  Vyvyan Evans,et al.  The Semantics of English Prepositions: Spatial Scenes, Embodied Meaning, and Cognition , 2003 .

[77]  Josef van Genabith,et al.  A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors , 2007, EMNLP.

[78]  Geoffrey K. Pullum,et al.  A Student's Introduction to English Grammar , 2021 .

[79]  Martin Chodorow,et al.  CriterionSM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays , 2003, IAAI.

[80]  Fumito Masui,et al.  Recognizing article errors in the writing of Japanese learners of English , 2005, Systems and Computers in Japan.

[81]  Michael Levy,et al.  Computer applications in second language acquisition : Foundations for teaching , testing and research , 2009 .

[82]  Stephanie Seneff,et al.  Automatic grammar correction for second-language learners , 2006, INTERSPEECH.

[83]  E. H. Hutten SEMANTICS , 1953, The British Journal for the Philosophy of Science.

[84]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[85]  Tuomo Kakkonen Robustness Evaluation of Two CCG, a PCFG and a Link Grammar Parsers , 2008, ArXiv.

[86]  James R. Curran,et al.  Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[87]  Volker Hegelheimer,et al.  ASSESSING LANGUAGE USING COMPUTER TECHNOLOGY , 2007, Annual Review of Applied Linguistics.

[88]  Rod Ellis,et al.  Analysing Learner Language , 2005 .

[89]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[90]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[91]  Jennifer Foster Good reasons for noting bad grammar : empirical investigations into the parsing of ungrammatical written English , 2005 .

[92]  Sylviane Granger,et al.  Prefabricated patterns in advanced EFL writing: collocations and formulae , 1998 .

[93]  L. Burnard The British National Corpus , 1998 .

[94]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[95]  Jianfeng Gao,et al.  A Web-based English Proofing System for English as a Second Language Users , 2008, IJCNLP.

[96]  Tony Silva Second Language Writing , 2006 .