Off-Topic Spoken Response Detection with Word Embeddings

In this study, we developed an automated off-topic response detection system as a supplementary module for an automated proficiency scoring system for non-native English speakers’ spontaneous speech. Given a spoken response, the system first generates an automated transcription using an ASR system trained on non-native speech, and then generates a set of features to assess similarity to the question. In contrast to previous studies which required a large set of training responses for each question, the proposed system only requires the question text, thus increasing the practical impact of the system, since new questions can be added to a test dynamically. However, questions are typically short and the traditional approach based on exact word matching does not perform well. In order to address this issue, a set of features based on neural embeddings and a convolutional neural network (CNN) were used. A system based on the combination of all features achieved an accuracy of 87% on a balanced dataset, which was substantially higher than the accuracy of a baseline system using question-based vector space models (49%). Additionally, this system almost reached the accuracy of vector space based model using a large set of responses to test questions (93%).

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Annie Louis,et al.  Off-topic essay detection using short prompt texts , 2010 .

[3]  Jian Cheng,et al.  Off-Topic Detection in Automated Speech Assessment Applications , 2011, INTERSPEECH.

[4]  Ronan Cummins,et al.  Sentence Similarity Measures for Fine-Grained Estimation of Topical Relevance in Learner Essays , 2016, BEA@NAACL-HLT.

[5]  Wai Kit Lo,et al.  Statistical phone duration modeling to filter for intact utterances in a computer-assisted pronunciation training system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Lei Chen,et al.  Exploring deep learning architectures for automatically grading non-native spontaneous speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[10]  D. H I G G I N S,et al.  Identifying off-topic student essays without topic-specific training data † , 2005 .

[11]  Maarten Versteegh,et al.  Learning Text Similarity with Siamese Recurrent Networks , 2016, Rep4NLP@ACL.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[14]  Su-Youn Yoon,et al.  Similarity-Based Non-Scorable Response Detection for Automated Speech Scoring , 2014, BEA@ACL.

[15]  W. Bruce Croft,et al.  Similarity measures for tracking information flow , 2005, CIKM '05.

[16]  Yu Wang,et al.  Off-topic Response Detection for Spontaneous Spoken English Assessment , 2016, ACL.

[17]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[18]  Claudia Leacock,et al.  Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications , 2010 .

[19]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[20]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[21]  Taraka Rama Siamese Convolutional Networks for Cognate Identification , 2016, COLING.

[22]  Joost van Doremalen,et al.  Utterance verification in language learning applications , 2009, SLaTE.

[23]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .