Automatic Detection of Off-Topic Spoken Responses Using Very Deep Convolutional Neural Networks

Test takers in high-stakes speaking assessments may try to inflate their scores by providing a response to a question that they are more familiar with instead of the question presented in the test; such a response is referred to as an off-topic spoken response. The presence of these responses can make it difficult to accurately evaluate a test taker’s speaking proficiency, and thus may reduce the validity of assessment scores. This study aims to address this problem by building an automatic system to detect off-topic spoken responses which can inform the downstream automated scoring pipeline. We propose an innovative method to interpret the comparison between a test response and the question used to elicit it as a similarity grid, and then apply very deep convolutional neural networks to determine different degrees of topic relevance. In this study, Inception networks were applied to this task, and the experimental results demonstrate the effectiveness of the proposed method. Our system achieves an F1-score of 92.8% on the class of off-topic responses, which significantly outperforms a baseline system using a range of word embedding-based similarity metrics (F1score = 85.5%).

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Annie Louis,et al.  Off-topic essay detection using short prompt texts , 2010 .

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  John H. L. Hansen,et al.  Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Su-Youn Yoon,et al.  Similarity-Based Non-Scorable Response Detection for Automated Speech Scoring , 2014, BEA@ACL.

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[10]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Karl Prodger The official Cambridge guide to IELTS for academic and general training [Book Review] , 2014 .

[13]  Jian Cheng,et al.  Off-Topic Detection in Automated Speech Assessment Applications , 2011, INTERSPEECH.

[14]  Su-Youn Yoon,et al.  Off-Topic Spoken Response Detection with Word Embeddings , 2017, INTERSPEECH.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ronan Cummins,et al.  Sentence Similarity Measures for Fine-Grained Estimation of Topical Relevance in Learner Essays , 2016, BEA@NAACL-HLT.

[17]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Lei Chen,et al.  Exploring deep learning architectures for automatically grading non-native spontaneous speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yu Wang,et al.  Off-topic Response Detection for Spontaneous Spoken English Assessment , 2016, ACL.

[22]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[23]  Peter W. Foltz,et al.  Detection of gaming in automated scoring of essays with the IEA , 2013 .

[24]  D. H I G G I N S,et al.  Identifying off-topic student essays without topic-specific training data † , 2005 .

[25]  Derrick Higgins,et al.  Managing What We Can Measure: Quantifying the Susceptibility of Automated Scoring Systems to Gaming Behavior , 2014 .

[26]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.