Atypical Inputs in Educational Applications

In large-scale educational assessments, the use of automated scoring has recently become quite common. While the majority of student responses can be processed and scored without difficulty, there are a small number of responses that have atypical characteristics that make it difficult for an automated scoring system to assign a correct score. We describe a pipeline that detects and processes these kinds of responses at run-time. We present the most frequent kinds of what are called non-scorable responses along with effective filtering models based on various NLP and speech processing technologies. We give an overview of two operational automated scoring systems —one for essay scoring and one for speech scoring— and describe the filtering models they use. Finally, we present an evaluation and analysis of filtering models used for spoken responses in an assessment of language proficiency.

[1]  Lei Chen,et al.  Exploring deep learning architectures for automatically grading non-native spontaneous speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[3]  Jian Cheng,et al.  Off-Topic Detection in Automated Speech Assessment Applications , 2011, INTERSPEECH.

[4]  David Suendermann-Oeft,et al.  Noise and Metadata Sensitive Bottleneck Features for Improving Speaker Recognition with Non-Native Speech Input , 2016, INTERSPEECH.

[5]  Jill Burstein,et al.  Identifying off-topic student essays without topic-specific training data , 2006, Natural Language Engineering.

[6]  Mark J. F. Gales,et al.  Automatically grading learners' English using a Gaussian process , 2015, SLaTE.

[7]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[8]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .

[9]  Wai Kit Lo,et al.  Statistical phone duration modeling to filter for intact utterances in a computer-assisted pronunciation training system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Su-Youn Yoon,et al.  Non-English Response Detection Method for Automated Proficiency Scoring System , 2011, BEA@ACL.

[11]  Mo Zhang,et al.  Contrasting Automated and Human Scoring of Essays , 2013 .

[12]  Joost van Doremalen,et al.  Utterance verification in language learning applications , 2009, SLaTE.

[13]  Yu Wang,et al.  An attention based model for off-topic spontaneous spoken response detection: An Initial Study , 2017, SLaTE.

[14]  David B. Pisoni,et al.  Two Experiments on Automatic Scoring of Spoken Language Proficiency , 2000 .

[15]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[16]  Annie Louis,et al.  Off-topic essay detection using short prompt texts , 2010 .

[17]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[18]  Su-Youn Yoon,et al.  Off-Topic Spoken Response Detection with Word Embeddings , 2017, INTERSPEECH.

[19]  Su-Youn Yoon,et al.  Acoustic Feature-based Non-scorable Response Detection for an Automated Speaking Proficiency Assessment , 2012, INTERSPEECH.

[20]  Peter W. Foltz,et al.  Detection of gaming in automated scoring of essays with the IEA , 2013 .

[21]  D. H I G G I N S,et al.  Identifying off-topic student essays without topic-specific training data † , 2005 .

[22]  Derrick Higgins,et al.  Managing What We Can Measure: Quantifying the Susceptibility of Automated Scoring Systems to Gaming Behavior , 2014 .

[23]  Keelan Evanini,et al.  Automatic plagiarism detection for spoken responses in an assessment of English language proficiency , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).