Alessandro Cucchiarelli DIIGA Università Politecnica delle Marche Ancona Italy, Alex@diiga.univpm.it INTRODUCTION Essays are considered by many researchers as the most useful tool to assess learning outcomes implying a) the ability to recall, organize and integrate ideas, b) the ability both to express oneself in writing and c) to supply more than identify interpretation and application of data. One of the difficulties of grading essays is represented by the perceived subjectivity of the grading process. Many researchers claim that the subjective nature of essay assessment leads to variation in grades awarded by different human assessors, which is perceived by students as a great source of unfairness. This issue may be faced through the adoption of tools for Automated Essay Grading (AEG). An AEG system would at least be consistent in the way it scores essays, and enormous cost and time savings could be achieved if the system can be shown to grade essays within the range of those awarded by human assessors. Moreover, an AEG system would be an extremely useful and valuable tool for distance learning students needing to practice self assessment on those topics that could not be easily covered via closed answer tests. Page in (1996) introduced a distinction between grading essays for content and for style, where the former refers loosely to what an essay says, while the latter to “syntax and mechanics and diction and other aspects of the way it is said”. In the current literature on AEG systems papers reporting experiments with systems aimed to evaluate essays primarily for content or for style, are discussed. Furthermore, systems aimed to evaluate essays taking in account both aspects are reported too (Valenti et al., 2003). Three different criteria have been discussed to measure the performance of AEG systems: accuracy of the results, multiple regression correlation and percentage of agreement between grades produced by the systems those assigned by human experts (Valenti et al., 2003). This paper is aimed to discuss the design of an AEG system that we are developing at the Università Politecnica delle Marche. The system will be initially devoted to grade essays for content, and will be based on text classification techniques defined in the context of our research in Natural Language Processing (Cucchiarelli 2001, Velardi 2000). Text Classification (TC) is the problem of assigning predefined categories to free text documents. The approach adopted relays on the availability of a large collection of documents that is used to train the classification system and to build the classes profiles. In our approach, the TC system will be trained on a collection of human-graded essays to create models of grading classes. Then, the obtained model will be used to classify previously unseen essays. The performances of the AEG system will be measured by comparing the percentage of agreement between the produced grades and those assigned by human experts to the unseen essays. Since no public domain collection of essays is actually available, this paper reports on our solution to solve this problem, too. The paper is organized as follows: in the first section some background information on text classification is provided. Then, our approach to automated essay marking via text classification along with the outline of the system under development, will be provided.
[1]
J. J. Rocchio,et al.
Relevance feedback in information retrieval
,
1971
.
[2]
Gerard Salton,et al.
The SMART Retrieval System—Experiments in Automatic Document Processing
,
1971
.
[3]
G Salton,et al.
Developments in Automatic Text Retrieval
,
1991,
Science.
[4]
E. B. Page.
Computer Grading of Student Prose, Using Modern Concepts and Software
,
1994
.
[5]
David D. Lewis,et al.
Text categorization of low quality images
,
1995
.
[6]
Leah S. Larkey,et al.
Automatic essay grading using text categorization techniques
,
1998,
SIGIR '98.
[7]
Paola Velardi,et al.
A Theoretical Analysis of Context-based Learning Algorithms or Word Sense Disambiguation
,
2000,
ECAI.
[8]
Robert Williams.
Automated essay grading: an evaluation of four conceptual models
,
2001
.
[9]
Paola Velardi,et al.
Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence
,
2001,
CL.
[10]
Yiming Yang,et al.
An Evaluation of Statistical Approaches to Text Categorization
,
1999,
Information Retrieval.
[11]
Mehdi Khosrow-Pour,et al.
Printed at:
,
2011
.