Using edit distance to analyse errors in a natural language to logic translation corpus

We have assembled a large corpus of student submissions to an automatic grading system, where the subject matter involves the translation of natural language sentences into propositional logic. Of the 2.3 million translation instances in the corpus, 286,000 (approximately 12%) are categorized as being in error. We want to understand the nature of the errors that students make, so that we can develop tools and supporting infrastructure that help students with the problems that these errors represent. With this aim in mind, this paper describes an analysis of a significant proportion of the data, using edit distance between incorrect answers and their corresponding correct solutions, and the associated edit sequences, as a means of organising the data and detecting categories of errors. We demonstrate that a large proportion of errors can be accounted for by means of a small number of relatively simple error types, and that the method draws attention to interesting phenomena in the data set.

[1]  James Reason,et al.  Human Error , 1990 .

[2]  Dave Barker-Plummer,et al.  Dimensions of Difficulty in Translating Natural Language into First-Order Logic , 2009, EDM.

[3]  John J. Clement,et al.  Translation Difficulties in Learning Mathematics , 1981 .

[4]  E. Corte,et al.  Making sense of word problems , 2000 .

[5]  D. Pimm,et al.  Speaking Mathematically: Communication in Mathematics Classrooms , 1987 .

[6]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[7]  Fulya Kula Verschaffel, L., Greer, B., and De Corte, E. (2000). Making Sense of Word Problems. Netherlands: Swets & Zeitlinger. , 2007 .

[8]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[9]  Robert Dale,et al.  Student Translations of Natural Language into Logic: The Grade Grinder Corpus Release 1.0. , 2011 .

[10]  Dave Barker-Plummer,et al.  Impedance Effects of Visual and Spatial Content upon Language-to-Logic Translation Accuracy , 2011, CogSci.

[11]  Dave Barker-Plummer,et al.  Language, Proof and Logic , 1999 .

[12]  Dave Barker-Plummer,et al.  Graphical Revelations: Comparing Students' Translation Errors in Graphics and Logic , 2008, Diagrams.

[13]  Tiffany Barnes,et al.  A pilot study on logic proof tutoring using hints generated from historical student data , 2008, EDM.

[14]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[15]  Robert Dale,et al.  An Empirical Study of Errors in Translating Natural Language into Logic , 2008 .

[16]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .