Word-Embedding based Content Features for Automated Oral Proficiency Scoring

In this study, we develop content features for an automated scoring system of non-native English speakers’ spontaneous speech. The features calculate the lexical similarity between the question text and the ASR word hypothesis of the spoken response, based on traditional word vector models or word embeddings. The proposed features do not require any sample training responses for each question, and this is a strong advantage since collecting question-specific data is an expensive task, and sometimes even impossible due to concerns about question exposure. We explore the impact of these new features on the automated scoring of two different question types: (a) providing opinions on familiar topics and (b) answering a question about a stimulus material. The proposed features showed statistically significant correlations with the oral proficiency scores, and the combination of new features with the speech-driven features achieved a small but significant further improvement for the latter question type. Further analyses suggested that the new features were effective in assigning more accurate scores for responses with serious content issues.

[1]  Su-Youn Yoon,et al.  Automatic assessment of syntactic complexity for spontaneous speech scoring , 2015, Speech Commun..

[2]  Maarten Versteegh,et al.  Learning Text Similarity with Siamese Recurrent Networks , 2016, Rep4NLP@ACL.

[3]  Klaus Zechner,et al.  Prompt-based Content Scoring for Automated Spoken Language Assessment , 2013, BEA@NAACL-HLT.

[4]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[5]  Su-Youn Yoon,et al.  Off-Topic Spoken Response Detection with Word Embeddings , 2017, INTERSPEECH.

[6]  L. Boves,et al.  Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology. , 2000, The Journal of the Acoustical Society of America.

[7]  Lei Chen,et al.  Exploring deep learning architectures for automatically grading non-native spontaneous speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[9]  M. Witt,et al.  Performance Measures for Phone-level Pronunciation Teaching in Call , 1998 .

[10]  D. H I G G I N S,et al.  Identifying off-topic student essays without topic-specific training data † , 2005 .

[11]  Jian Cheng,et al.  Automatic Assessment of the Speech of Young English Learners , 2014, BEA@ACL.

[12]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[13]  Klaus Zechner,et al.  Exploring Content Features for Automated Speech Scoring , 2012, HLT-NAACL.

[14]  Su-Youn Yoon,et al.  Word-level F0 modeling in the automated assessment of non-native read speech , 2015, SLaTE.

[15]  Annie Louis,et al.  Off-topic essay detection using short prompt texts , 2010 .

[16]  Ronan Cummins,et al.  Sentence Similarity Measures for Fine-Grained Estimation of Topical Relevance in Learner Essays , 2016, BEA@NAACL-HLT.

[17]  Anastassia Loukina,et al.  Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks , 2017, SCNLP@EMNLP 2017.

[18]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[19]  Xiaoming Xi,et al.  Improved pronunciation features for construct-driven assessment of non-native spontaneous speech , 2009, HLT-NAACL.

[20]  Klaus Zechner,et al.  Applying rhythm metrics to non-native spontaneous speech , 2013, SLaTE.

[21]  Klaus Zechner,et al.  Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech , 2011, ACL.

[22]  Xiaoming Xi,et al.  Evaluating analytic scoring for the TOEFL® Academic Speaking Test (TAST) for operational use , 2007 .

[23]  Anastassia Loukina,et al.  Building Better Open-Source Tools to Support Fairness in Automated Scoring , 2017, EthNLP@EACL.

[24]  Anastassia Loukina,et al.  Automated scoring across different modalities , 2016, BEA@NAACL-HLT.

[25]  Xiaoming Xi,et al.  Evaluating prosodic features for automated scoring of non-native read speech , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[26]  Su-Youn Yoon,et al.  Vocabulary Profile as a Measure of Vocabulary Sophistication , 2012, BEA@NAACL-HLT.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.