Students find logic hard. In particular, they seem to find it hard to translate natural language sentences into their corresponding representations in logic. As an enabling step towards determining why this is the case, this paper presents the public release of a corpus of over 4.5 million translations of natural language (nl) sentences into first-order logic (fol), provided by 55,000 students from almost 50 countries over a period of 10 years. The translations, provided by the students as fol renderings of a collection of 275 nl sentences, were automatically graded by an online assessment tool, the Grade Grinder. More than 604,000 are in error, exemplifying a wide range of misunderstandings and confusions that students struggle with. The corpus thus provides a rich source of data for discovering how students learn logical concepts and for correlating error patterns with linguistic features. We describe the structure and content of the corpus in some detail, and discuss a range of potentially fruitful lines of enquiry. Our hope is that educational data mining of the corpus will lead to improved logic curricula and teaching practice.
[1]
Dave Barker-Plummer,et al.
Graphical Revelations: Comparing Students' Translation Errors in Graphics and Logic
,
2008,
Diagrams.
[2]
Dave Barker-Plummer,et al.
Student Translations of Natural Language into Logic: The Grade Grinder Translation Corpus Release 1.0
,
2011,
EDM.
[3]
Dave Barker-Plummer,et al.
Dimensions of Difficulty in Translating Natural Language into First-Order Logic
,
2009,
EDM.
[4]
Dave Barker-Plummer,et al.
Impedance Effects of Visual and Spatial Content upon Language-to-Logic Translation Accuracy
,
2011,
CogSci.
[5]
Kenneth R. Koedinger,et al.
A Data Repository for the EDM Community: The PSLC DataShop
,
2010
.
[6]
Dave Barker-Plummer,et al.
Language, Proof and Logic
,
1999
.
[7]
Robert Dale,et al.
An Empirical Study of Errors in Translating Natural Language into Logic
,
2008
.