Improvements in predicting children's overall reading ability by modeling variability in evaluators' subjective judgments

Automatic literacy assessment is one promising application of speech and language processing research. In our previous work, we showed we could accurately predict children's overall ability to read a list of English words aloud, an integral component of early literacy assessment. In this paper, we improve upon our results by exploiting the fact that evaluators' level of agreement significantly varies, depending on the child being judged. This source of evaluator variability is directly modeled using generalized least squares linear regression. In this framework, the children for which the evaluators were more confident in rating are weighted higher. Performance in predicting the mean evaluator's scores increases from a Pearson's correlation coefficient of 0.946 to 0.952, a relative improvement of 0.63%. This is a significantly higher correlation than the mean inter-evaluator agreement of 0.899 (p <; 0.05). Critically, the mean and maximum absolute errors are significantly reduced.

[1]  Abeer Alwan,et al.  A Generative Student Model for Scoring Word Reading Skills , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hugo Van hamme,et al.  Automatic assessment of children's reading level , 2007, INTERSPEECH.

[3]  Piero Cosi,et al.  Italian children's speech recognition for advanced interactive literacy tutors , 2005, INTERSPEECH.

[4]  Abeer Alwan,et al.  TBALL data collection: the making of a young children's speech corpus , 2005, INTERSPEECH.

[5]  Shrikanth S. Narayanan,et al.  Emotion classification from speech using evaluator reliability-weighted combination of ranked lists , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  H. Levene Robust tests for equality of variances , 1961 .

[7]  Shrikanth S. Narayanan,et al.  Automatically assessing the ABCs: Verification of children's spoken letter-names and letter-sounds , 2011, TSLP.

[8]  Ronald A. Cole,et al.  Highly accurate children's speech recognition for interactive reading tutors using subword units , 2007, Speech Commun..

[9]  Jack Mostow,et al.  A Prototype Reading Coach that Listens , 1994, AAAI.

[10]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[11]  Abeer Alwan,et al.  Assessment of emerging reading skills in young native speakers and language learners , 2009, Speech Commun..

[12]  Shrikanth S. Narayanan,et al.  Data-dependent evaluator modeling and its application to emotional valence classification from speech , 2010, INTERSPEECH.

[13]  M P Black,et al.  Automatic Prediction of Children's Reading Ability for High-Level Literacy Assessment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Abeer Alwan,et al.  Pronunciation verification of children²s speech for automatic literacy assessment , 2006, INTERSPEECH.

[15]  Abeer Alwan,et al.  A System for Technology Based Assessment of Language and Literacy in Young Children: the Role of Multiple Information Sources , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.