Extending Web Search for Online Plagiarism Detection

As information technologies advance, the data amount gathered on the Internet increases at an incredible rapid speed. To solve the data overloading problem, people commonly use Web search engines to find what they need. However, as search engines become an efficient and effective tool, plagiarists can grab, reassemble and redistribute text contents without much difficulty. In this paper, we develop an online detection system to reduce such misapplication of search engines. Specifically, suspicious documents are extracted and verified through the collaboration of our plagiarism detection system and search engines. With a proper design, extracted text segments are given different priorities when sending them to search engines as the ascertainment of plagiarism. This greatly reduces unnecessary and repetitive works when performing plagiarism detection.

[1]  Boumediene Belkhouche,et al.  Plagiarism detection in software designs , 2004, ACM-SE 42.

[2]  Hector Garcia-Molina,et al.  Building a scalable and accurate copy detection mechanism , 1996, DL '96.

[3]  Colin J. Neill,et al.  A Web-Enabled Plagiarism Detection Tool , 2004, IT Prof..

[4]  Parvati Iyer,et al.  Document Similarity Analysis for a Plagiarism Detection System , 2005, IICAI.

[5]  Thomas P. Way,et al.  SNITCH: a software tool for detecting cut and paste plagiarism , 2006, SIGCSE '06.

[6]  Mike Joy,et al.  Sentence-based natural language plagiarism detection , 2004, JERC.

[7]  Stefan Gruner,et al.  Tool support for plagiarism detection in text documents , 2005, SAC '05.

[8]  Michael J. Wise Detection of similarities in student programs: YAP'ing may be preferable to plague'ing , 1992, SIGCSE '92.

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  Hector Garcia-Molina,et al.  SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.

[11]  Brian Martin,et al.  Plagiarism: policy against cheating or policy for learning? , 2004 .

[12]  Wolfgang Kienreich,et al.  Plagiarism Detection in Large Sets of Press Agency News Articles , 2006, 17th International Workshop on Database and Expert Systems Applications (DEXA'06).

[13]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[14]  G. Whale Indentification of Program Similarity in Large Populations , 1990, Comput. J..

[15]  K. J. Ottenstein An algorithmic approach to the detection and prevention of plagiarism , 1976, SGCS.

[16]  Grace Hui Yang,et al.  Near-duplicate detection by instance-level constrained clustering , 2006, SIGIR.

[17]  Yiu-Kai Ng,et al.  A Sentence-Based Copy Detection Approach for Web Documents , 2005, FSKD.