Predicting children's reading ability using evaluator-informed features

Automatic reading assessment software has the difficult task of trying to model human-based observations, which have both objective and subjective components. In this paper, we mimic the grading patterns of a “ground-truth” (average) evaluator in order to produce models that agree with many people’s judgments. We examine one particular reading task, where children read a list of words aloud, and evaluators rate the children’s overall reading ability on a scale from one to seven. We first extract various features correlated with the specific cues that evaluators said they used. We then compare various supervised learning methods that mapped the most relevant features to the ground-truth evaluator scores. Our final system predicted these scores with 0.91 correlation, higher than the average inter-evaluator agreement.