iPlag: Intelligent Plagiarism Reasoner in scientific publications

Existing anti-plagiarism tools are, in fact, text matching systems but do not make accurate judgments about plagiarism. Texts that are acceptable to be redundant and texts that are cited properly are all highlighted as plagiarism, and the real decision of plagiarism is left up to the user. To reduce the human input and to give more reliance to automatic plagiarism detectors, we propose an Intelligent Plagiarism Reasoner (iPlag), which works by combining several analytical procedures. Scholarly documents under investigation are segmented into logical tree-structured representation using a procedure called D-SEGMENT. Statistical methods are utilised to assign numerical weights to structural components under a technique called C-WEIGHT. Relevance ranking (R-RANK) and plagiarism screening approaches (P-SCREEN) are adjusted to incorporate structural weights, citation evidences, syntax-based and semantic-based methods into plagiarism detection results. We encourage current plagiarism detection systems to adapt the proposed framework.

[1]  Sergey Butakov,et al.  Using Microsoft SQL Server platform for plagiarism detection , 2009 .

[2]  Marcos André Gonçalves,et al.  Using structural information to improve search in Web collections , 2010 .

[3]  Mohamed Elhadi,et al.  Duplicate Detection in Documents and WebPages Using Improved Longest Common Subsequence and Documents Syntactical Structures , 2009, 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology.

[4]  Yiu-Kai Ng,et al.  A Sentence-Based Copy Detection Approach for Web Documents , 2005, FSKD.

[5]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[6]  Byung-Ryul Ahn,et al.  Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[7]  Helen Zhang,et al.  CrossCheck: an effective tool for detecting plagiarism , 2010, Learn. Publ..

[8]  Thomas P. Way,et al.  SNITCH: a software tool for detecting cut and paste plagiarism , 2006, SIGCSE '06.

[9]  Renata de Matos Galante,et al.  A New Approach for Cross-Language Plagiarism Analysis , 2010, CLEF.

[10]  Alberto Barrón-Cedeño,et al.  On the mono- and cross-language detection of text reuse and plagiarism , 2010, Proces. del Leng. Natural.

[11]  Irene Anderson Avoiding plagiarism in academic writing. , 2009, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[12]  Cristian Grozea,et al.  ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection ∗ , 2009 .

[13]  Mohamed Elhadi,et al.  Use of text syntactical structures in detection of document duplicates , 2008, 2008 Third International Conference on Digital Information Management.

[14]  Naomie Salim,et al.  Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[15]  Emanuele Caglioti,et al.  A plagiarism detection procedure in three steps: Selection, matches and squares , 2009 .

[16]  Chris Clifton,et al.  Efficient privacy-preserving similar document detection , 2010, The VLDB Journal.

[17]  Min-Yen Kan,et al.  Logical Structure Recovery in Scholarly Articles with Rich Document Features , 2010, Int. J. Digit. Libr. Syst..

[18]  Berthier A. Ribeiro-Neto,et al.  Using structural information to improve search in Web collections , 2010, J. Assoc. Inf. Sci. Technol..

[19]  Christine Winberg,et al.  Avoiding Plagiarism in Contexts of Development and Change , 2008, IEEE Transactions on Education.

[20]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Web search basics , 2008 .

[21]  Hector Garcia-Molina,et al.  The SCAM Approach to Copy Detection in Digital Libraries , 1995, D Lib Mag..

[22]  Kirsty Meddings Credit where credit's due: plagiarism screening in scholarly publishing , 2010, Learn. Publ..

[23]  Benno Stein,et al.  Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[24]  Hector Garcia-Molina,et al.  SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.

[25]  Alberto Barrón-Cedeño,et al.  On Cross-lingual Plagiarism Analysis using a Statistical Model , 2008, PAN.

[26]  Mathieu Bouville,et al.  Plagiarism: Words and Ideas , 2008, Sci. Eng. Ethics.

[27]  Naomie Salim,et al.  Using structural information and citation evidence to detect significant plagiarism cases in scientific publications , 2012, J. Assoc. Inf. Sci. Technol..

[28]  Sue R Whittle,et al.  Learning about plagiarism using Turnitin detection software , 2008, Medical education.

[29]  Janis Grundspenkis,et al.  Computer-based plagiarism detection methods and tools: an overview , 2007, CompSysTech '07.

[30]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[31]  C. Lyon,et al.  Demonstration of the Ferret Plagiarism Detector , 2006 .

[32]  James A. Malcolm,et al.  Detecting Short Passages of Similar Text in Large Document Collections , 2001, EMNLP.

[33]  Alexander F. Gelbukh,et al.  PPChecker: Plagiarism Pattern Checker in Document Copy Detection , 2006, TSD.

[34]  Jayati Chaudhuri Deterring Digital Plagiarism, How Effective is the Digital Detection Process? , 2008, Webology.

[35]  Naomie Salim,et al.  Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).