A New Hybrid Technique for Detection of Plagiarism from Text Documents

Plagiarism occurs when we use the ideas, expressions, work, and words of other authors and do not give them the required attribution. The major contributing factor in plagiarism is the availability of a high amount of data and information on the internet that can be swiftly accessed. The proposed system introduces an extrinsic plagiarism detection approach inspired by cognition because it utilizes semantic knowledge to detect the plagiarized part from the text without human involvement. A lexical database like WordNet assists the computers to perceive the data and information. These days most of the plagiarism detection systems fail to detect highly complex cases of plagiarism. The proposed system uses Dice measure as similarity measure for finding the semantic resemblance between the pair of sentences. It also uses linguistic features like path similarity, depth estimation measure to compute the resemblance between the pair of words and these features are combined by assigning different weights to them. It is capable of identifying cases like restructuring, paraphrasing, verbatim copy, and synonymized plagiarism. It has been evaluated on the PAN-PC-11 corpus. The results obtained from the proposed system signify that it has outperformed other existing systems on PAN-PC-11 in terms of precision, recall, F -measure, and PlagDet score. The proposed system has innovative approach, but the results are somehow close and reasonably better than the existing systems.

[1]  Asif Ekbal,et al.  Plagiarism detection in text using Vector Space Model , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[2]  Juan D. Velásquez,et al.  Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style , 2013, Expert Syst. Appl..

[3]  Hector Garcia-Molina,et al.  Query processing and inverted indices in shared-nothing text document information retrieval systems , 1993, The VLDB Journal.

[4]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Rasim M. Alguliyev,et al.  PDLK: Plagiarism detection using linguistic knowledge , 2015, Expert Syst. Appl..

[6]  Man Yan Miranda Chong,et al.  A study on plagiarism detection and plagiarism direction identification using natural language processing techniques , 2013 .

[7]  Vishal Gupta,et al.  Efficiency comparison of various plagiarism detection techniques , 2016, 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT).

[8]  Wang Shuai,et al.  Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection Notebook for PAN at CLEF 2012 , 2012 .

[9]  Prasenjit Majumder,et al.  External & Intrinsic Plagiarism Detection: VSM & Discourse Markers based Approach - Notebook for PAN at CLEF 2011 , 2011, CLEF.

[10]  Shen Jun-yi,et al.  A Survey on Natural Language Text Copy Detection , 2003 .

[11]  Vishal Gupta,et al.  A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources , 2017, Cognitive Computation.

[12]  Naomie Salim,et al.  Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Sangeetha Jamal,et al.  An Improved SRL Based Plagiarism Detection Technique Using Sentence Ranking , 2015 .

[14]  Kensuke Baba,et al.  Fast plagiarism detection based on simple document similarity , 2017, 2017 Twelfth International Conference on Digital Information Management (ICDIM).

[15]  Deepa Gupta,et al.  Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[16]  Farzin Yaghmaee,et al.  Automatic external Persian plagiarism detection using vector space model , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[17]  Naomie Salim,et al.  An improved plagiarism detection scheme based on semantic role labeling , 2012, Appl. Soft Comput..

[18]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[19]  Tommy W. S. Chow,et al.  Content-based hierarchical document organization using multi-layer hybrid network and tree-structured features , 2010, Expert Syst. Appl..

[20]  Naomie Salim,et al.  Web Based Cross Language Plagiarism Detection , 2010, 2010 Second International Conference on Computational Intelligence, Modelling and Simulation.

[21]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[22]  Deepa Gupta,et al.  Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges , 2018, Inf. Process. Manag..

[23]  Phyllis Altheide Spatial Data Transfer Standard (SDTS) , 2017, Encyclopedia of GIS.

[24]  Naomie Salim,et al.  Plagiarism detection scheme based on Semantic Role Labeling , 2012, 2012 International Conference on Information Retrieval & Knowledge Management.

[25]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[26]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[27]  Deepa Gupta,et al.  Detection of idea plagiarism using syntax-Semantic concept extractions with genetic algorithm , 2017, Expert Syst. Appl..

[28]  Efstathios Stamatatos,et al.  Plagiarism detection using stopword n-grams , 2011, J. Assoc. Inf. Sci. Technol..

[29]  El-Sayed M. El-Alfy,et al.  Boosting paraphrase detection through textual similarity metrics with abductive networks , 2015, Appl. Soft Comput..

[30]  Youness Madani,et al.  Fuzzy Cross Language Plagiarism Detection Approach Based on Semantic Similarity and Hadoop MapReduce , 2018, Recent Advances in Intuitionistic Fuzzy Logic Systems.

[31]  Alexander F. Gelbukh,et al.  PPChecker: Plagiarism Pattern Checker in Document Copy Detection , 2006, TSD.

[32]  Sumam Mary Idicula,et al.  A Plagiarism Detection System for Malayalam Text Based Documents with Full and Partial Copy , 2016 .

[33]  Deepa Gupta,et al.  Using K-means cluster based techniques in external plagiarism detection , 2014, 2014 International Conference on Contemporary Computing and Informatics (IC3I).

[34]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[35]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[36]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[37]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[38]  Rasim M. Alguliyev,et al.  A linguistic treatment for automatic external plagiarism detection , 2017, Knowl. Based Syst..

[39]  Shuai Wang,et al.  Combination of VSM and Jaccard coefficient for external plagiarism detection , 2013, 2013 International Conference on Machine Learning and Cybernetics.