An Empirical Study of Errors in Translating Natural Language into Logic

Every teacher of logic knows that the ease with which a student can translate a natural language sentence into formal logic depends, amongst other things, on just how that natural language sentence is phrased. This paper reports findings from a pilot study of a large scale corpus in the area of formal logic education, where we used a very large dataset to provide empirical evidence for specific characteristics of natural language problem statements that frequently lead to students making mistakes. We developed a rich taxonomy of the types of errors that students make, and implemented tools for automatically classifying student errors into these categories. In this paper, we focus on three specific phenomena that were prevalent in our data: Students were found (a) to have particular difficulties with distinguishing the conditional from the biconditional, (b) to be sensitive to word-order effects during translation, and (c) to be sensitive to factors associated with the naming of constants. We conclude by considering the implications of this kind of large-scale empirical study for improving an automated assessment system specifically, and logic teaching more generally.