Tools for External Plagiarism Detection in DOCODE

In this paper we describe the algorithms and tools offered by DOCODE, a system for plagiarism detection in educational institutions, with a special focus on the task of external plagiarism detection using the Web as a source of information. In that context, although DOCODE is a full-featured system based on several algorithms, our main contribution is an algorithm that given a document is capable of retrieving similar or related documents from the Web, tackling the problem of external plagiarism detection. However, all our algorithms work together to provide high-level plagiarism detection functionalities to our users. Therefore, here we also give details about how these functionalities are bundled and presented in ad-hoc Web-based user interfaces for different kinds of clients, supporting the decision-making process regarding possible plagiarism cases.

[1]  D. Wilton,et al.  Chicago Manual of Style , 2016 .

[2]  Felipe Bravo-Marquez,et al.  An automatic text comprehension classifier based on mental models and latent semantic features , 2011, i-KNOW '11.

[3]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[4]  W. Harkness Properties of the extended hypergeometric distribution , 1965 .

[5]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[6]  Felipe Bravo-Marquez,et al.  A Text Similarity Meta-Search Engine Based on Document Fingerprints and Search Results Records , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[7]  Zdenek Ceska,et al.  Plagiarism Detection Based on Singular Value Decomposition , 2008, GoTAL.

[8]  Mark Burgin,et al.  Knowlege-Based and Intelligent Information and Engineering Systems , 2011, Lecture Notes in Computer Science.

[9]  Alberto Barrón-Cedeño,et al.  Methods for cross-language plagiarism detection , 2013, Knowl. Based Syst..

[10]  Hermann A. Maurer,et al.  Coping With the Copy-Paste-Syndrome , 2007 .

[11]  Fintan Culwin,et al.  Classifications of plagiarism detection engines , 2005 .

[12]  Nivio Ziviani,et al.  Retrieving Similar Documents from the Web , 2003, J. Web Eng..

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Felipe Bravo-Marquez,et al.  Hypergeometric Language Model and Zipf-Like Scoring Function for Web Document Similarity Retrieval , 2010, SPIRE.

[15]  Luis A. Guerrero,et al.  DOCODE-Lite: A Meta-Search Engine for Document Similarity Retrieval , 2010, KES.

[16]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[17]  Donald L. Mccabe,et al.  Cheating: Why Students Do It and How We Can Help Them Stop. , 2001 .

[18]  Tuomo Kakkonen,et al.  Hermetic and Web Plagiarism Detection Systems for Student Essays—An Evaluation of the State-of-the-Art , 2010 .

[19]  Paul Clough,et al.  Plagiarism in natural and programming languages: an overview of current tools and technologies , 2000 .

[20]  Paul Clough,et al.  Old and new challenges in automatic plagiarism detection , 2003 .

[21]  A. Lathrop,et al.  Guiding Students from Cheating and Plagiarism to Honesty and Integrity: Strategies for Change , 2005 .

[22]  Patrick M. Scanlon,et al.  Internet Plagiarism among College Students. , 2002 .

[23]  Nancy A. Blumenstock The Chicago Manual of Style . By the University of Chicago Press. 13th ed. Chicago: University of Chicago Press, 1982. ix, 740 pp. Glossary of Technical Terms, Bibliography, Index. $25. , 1984, The Journal of Asian Studies.

[24]  Sebastián A. Ríos,et al.  FastDocode: Finding Approximated Segments of N-Grams for Document Copy Detection - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[25]  Benno Stein,et al.  Corpus and Evaluation Measures for Automatic Plagiarism Detection , 2010, LREC.

[26]  Juan D. Velásquez,et al.  Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style , 2013, Expert Syst. Appl..

[27]  Tuomo Kakkonen,et al.  Automatic Student Plagiarism Detection: Future Perspectives , 2010 .

[28]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[29]  Benno Stein,et al.  Strategies for retrieving plagiarized documents , 2007, SIGIR.

[30]  J. Welsh,et al.  The Little Book of Plagiarism , 2008 .

[31]  Chris J. Park,et al.  In Other (People's) Words: Plagiarism by university students--literature and lessons , 2003 .

[32]  Sebastián A. Ríos,et al.  Outlier-Based Approaches for Intrinsic and External Plagiarism Detection , 2011, KES.

[33]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .