Reducing Annotation Effort in Automatic Essay Evaluation Using Locality Sensitive Hashing

Automated essay evaluation systems use machine learning models to predict the score for an essay. For such, a training essay set is required which is usually created by human requiring time-consuming effort. Popular choice for scoring is a nearest neighbor model which requires on-line computation of nearest neighbors to a given essay. This is, however, a time-consuming task. In this work, we propose to use locality sensitive hashing that helps to select a small subset of a large set of essays such that it will likely contain the nearest neighbors for a given essay. We provided experiments on real-world data sets provided by Kaggle. According to the experimental results, it is possible to achieve good performance on scoring by using the proposed approach. The proposed approach is efficient with regard to time complexity. Also, it works well in case of a small number of training essays labeled by human and gives comparable results to the case when a large essay sets are used.