Automatic detection of plagiarized spoken responses

This paper addresses the task of automatically detecting plagiarized responses in the context of a test of spoken English proficiency for non-native speakers. A corpus of spoken responses containing plagiarized content was collected from a high-stakes assessment of English proficiency for non-native speakers, and several text-to-text similarity metrics were implemented to compare these responses to a set of materials that were identified as likely sources for the plagiarized content. Finally, a classifier was trained using these similarity metrics to predict whether a given spoken response is plagiarized or not. The classifier was evaluated on a data set containing the responses with plagiarized content and non-plagiarized control responses and achieved accuracies of 92.0% using transcriptions and 87.1% using ASR output (with a baseline accuracy of 50.0%).

[1]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[2]  Alexander G. Hauptmann Spoken Document Retrieval, Automatic , 2006 .

[3]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .

[4]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[5]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[6]  Matthew G. Snover,et al.  TERp System Description , 2008 .

[7]  Boris Katz,et al.  Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection , 2005, IJCNLP.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Lei Chen,et al.  Detecting Structural Events for Assessing Non-Native Speech , 2011, BEA@ACL.

[10]  Hector Garcia-Molina,et al.  Copy detection mechanisms for digital documents , 1995, SIGMOD '95.

[11]  Hao-Ren Ke,et al.  Plagiarism Detection using ROUGE and WordNet , 2010, ArXiv.

[12]  Alexander Hauptmann Automatic Spoken Document Retrieval , 2015 .

[13]  Nitin Madnani,et al.  Re-examining Machine Translation Metrics for Paraphrase Identification , 2012, NAACL.

[14]  Hector Garcia-Molina,et al.  SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.

[15]  Eiichiro Sumita,et al.  Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence , 2005, IJCNLP.

[16]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[17]  James A. Malcolm,et al.  Plagiarism is Easy, but also Easy To Detect , 2006 .

[18]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[19]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[20]  Boris Katz,et al.  Using Syntactic Information to Identify Plagiarism , 2005 .

[21]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .