Academic Plagiarism Detection

This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of academic plagiarism, and computational plagiarism detection methods. We show that academic plagiarism detection is a highly active research field. Over the period we review, the field has seen major advances regarding the automated detection of strongly obfuscated and thus hard-to-identify forms of academic plagiarism. These improvements mainly originate from better semantic text analysis methods, the investigation of non-textual content features, and the application of machine learning. We identify a research gap in the lack of methodologically thorough performance evaluations of plagiarism detection systems. Concluding from our analysis, we see the integration of heterogeneous analysis methods for textual and non-textual content features using machine learning as the most promising area for future research contributions to improve the detection of academic plagiarism further.

[1]  Carl Vogel,et al.  Style-based distance features for author verification - Notebook for PAN at CLEF 2013. , 2013 .

[2]  Ondrej Veselý,et al.  Source Retrieval via Naïve Approach and Passage Selection Heuristics Notebook for PAN at CLEF2013 , 2013, CLEF.

[3]  Azadeh Shakery,et al.  Expanded N-Grams for Semantic Text Alignment Notebook for PAN at CLEF 2014 , 2014, CLEF.

[4]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment , 2014, *SEMEVAL.

[5]  Lee Gillam,et al.  A Textual Modus Operandi: Surrey's Simple System for Author Identification Notebook for PAN at CLEF 2013 , 2013, CLEF.

[6]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[7]  Samee U. Khan,et al.  A literature review on the state-of-the-art in patent analysis , 2014 .

[8]  Daniel Castro-Castro,et al.  Authorship Verification, combining Linguistic Features and Different Similarity Functions , 2015, CLEF.

[9]  Roman Kern,et al.  Towards Authorship Attribution for Bibliometrics using Stylometric Features , 2015, CLBib@ISSI.

[10]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[11]  Mark Stevenson,et al.  Hashing and Merging Heuristics for Text Reuse Detection , 2014, CLEF.

[12]  Deepa Gupta,et al.  Plagiarism detection in text documents using sentence bounded stop word n-grams , 2016 .

[13]  Jimmy J. Lin,et al.  UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement , 2016, *SEMEVAL.

[14]  Magdalena Jankowska,et al.  Ensembles of Proximity-Based One-Class Classifiers for Author Verification Notebook for PAN at CLEF 2014 , 2014, CLEF.

[15]  Iván V. Meza,et al.  Homotopy Based Classification for Author Verification Task: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[16]  Deepa Gupta,et al.  Study on extrinsic text plagiarism detection techniques and tools , 2016 .

[17]  Norman Meuschke,et al.  Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence , 2011, DocEng '11.

[18]  Steven Bethard,et al.  Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence , 2014, TACL.

[19]  Vadim V. Strijov,et al.  Methods for Intrinsic Plagiarism Detection and Author Diarization , 2016, CLEF.

[20]  Mirco Kocher UniNE at CLEF 2016: Author Clustering , 2016, CLEF.

[21]  John C. Henderson,et al.  MITRE: Seven Systems for Semantic Similarity in Tweets , 2015, *SEMEVAL.

[22]  T. Solorio,et al.  Machine Translation Evaluation Metric for Text Alignment Notebook for PAN at CLEF 2014 , 2014 .

[23]  Debora Weber-Wulff,et al.  False Feathers: A Perspective on Academic Plagiarism , 2014 .

[24]  Houda Alberts Author clustering with the Aid of a Simple Distance Measure , 2017, CLEF.

[25]  José Guilherme Camargo de Souza,et al.  FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual Semantic Similarity Measurement Using Quality Estimation Features and Compositional Bilingual Word Embeddings , 2016, *SEMEVAL.

[26]  Pierre-François Marteau,et al.  Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams , 2014, TSD.

[27]  Youssef Iraqi,et al.  A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF) , 2014, CLEF.

[28]  L. Gleitman,et al.  Language and thought , 2005 .

[29]  Paolo Rosso,et al.  A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization , 2014, EACL.

[30]  Abdelmajid Ben Hamadou,et al.  Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences , 2015, ICCCI.

[31]  Norman Meuschke,et al.  Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[32]  Norman Meuschke,et al.  Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus , 2014, J. Assoc. Inf. Sci. Technol..

[33]  Deepa Gupta,et al.  Using K-means cluster based techniques in external plagiarism detection , 2014, 2014 International Conference on Contemporary Computing and Informatics (IC3I).

[34]  Azadeh Shakery,et al.  Using a Dictionary and n-gram Alignment to Improve Fine-grained Cross-Language Plagiarism Detection , 2016, DocEng.

[35]  Sangeetha Jamal,et al.  An Improved SRL Based Plagiarism Detection Technique Using Sentence Ranking , 2015 .

[36]  Goran Hrovat,et al.  Establishing of a Slovenian open access infrastructure: a technical point of view , 2014, Program.

[37]  Arun Jayapal,et al.  Vector Space Model and Overlap Metric for Author Identification Notebook for PAN at CLEF 2013 , 2013, CLEF.

[38]  Paolo Rosso,et al.  Building Arabic corpora from Wikisource , 2013, 2013 ACS International Conference on Computer Systems and Applications (AICCSA).

[39]  Benno Stein,et al.  Strategies for retrieving plagiarized documents , 2007, SIGIR.

[40]  Grigori Sidorov,et al.  Author Verification Using Syntactic N-grams: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[41]  Dilip Kumar Sharma,et al.  A state of art on source code plagiarism detection , 2016, 2016 2nd International Conference on Next Generation Computing Technologies (NGCT).

[42]  El Habib Benlahmar,et al.  Survey of Plagiarism Detection Approaches and Big data Techniques related to Plagiarism Candidate Retrieval , 2017, BDCA.

[43]  Deepa Gupta,et al.  Using Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[44]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .

[45]  Carl Vogel,et al.  Author Verification: Basic Stacked Generalization Applied To Predictions from a Set of Heterogeneous Learners - Notebook for PAN at CLEF 2015 , 2015, CLEF.

[46]  Lucia Vardanega,et al.  Is plagiarism changing over time? A 10-year time-lag study with three points of measurement , 2016 .

[47]  Ayu Purwarianti,et al.  Experiments on the Indonesian plagiarism detection using latent semantic analysis , 2014, 2014 2nd International Conference on Information and Communication Technology (ICoICT).

[48]  Rita Kuznetsova,et al.  Style Breach Detection with Neural Sentence Embeddings , 2017, CLEF.

[49]  Diego Antonio Rodríguez Torrejón,et al.  Text Alignment Module in CoReMo 2.1 Plagiarism Detector Notebook for PAN at CLEF 2013 , 2013, CLEF.

[50]  Elizabeth Wager Defining and responding to plagiarism , 2014, Learn. Publ..

[51]  Frantz Rowe,et al.  What literature review is not: diversity, boundaries and recommendations , 2014, Eur. J. Inf. Syst..

[52]  Moritz Schubotz,et al.  Analyzing Semantic Concept Patterns to Detect Academic Plagiarism , 2017, WOSP@JCDL.

[53]  Grigori Sidorov,et al.  Dynamically Adjustable Approach through Obfuscation Type Recognition , 2015, CLEF.

[54]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[55]  Yurii Palkovskii,et al.  Using Hybrid Similarity Methods for Plagiarism Detection Notebook for PAN at CLEF 2013 , 2013, CLEF.

[56]  Victoria Bobicev Authorship Detection with PPM Notebook for PAN at CLEF 2013 , 2013, CLEF.

[57]  Norman Meuschke,et al.  State-of-the-art in detecting academic plagiarism , 2013 .

[58]  Mingxing Wang,et al.  Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013 , 2013, CLEF.

[59]  Matthias Hagen,et al.  Author Obfuscation: Attacking the State of the Art in Authorship Verification , 2016, CLEF.

[60]  Tommaso Caselli,et al.  FBK-TR: SVM for Semantic Relatedeness and Corpus Patterns for RTE , 2014, SemEval@COLING.

[61]  Naomie Salim,et al.  Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[62]  Muazzam Ahmed Siddiqui,et al.  Query Optimization in Arabic Plagiarism Detection: An Empirical Study , 2014 .

[63]  M. R. Ghaeini Intrinsic Author Identification Using Modified Weighted KNN Notebook for PAN at CLEF 2013 , 2013, CLEF.

[64]  Christian Hänig,et al.  ExB Themis: Extensive Feature Extraction from Word Alignments for Semantic Textual Similarity , 2015, *SEMEVAL.

[65]  Joseph Clare,et al.  How Prevalent is Contract Cheating and to What Extent are Students Repeat Offenders? , 2017 .

[66]  Xindong Wu,et al.  Authorship identification from unstructured texts , 2014, Knowl. Based Syst..

[67]  Carl Vogel,et al.  Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm - Notebook for PAN at CLEF 2014 , 2014, CLEF.

[68]  Tomas Brychcin,et al.  UWB at SemEval-2016 Task 1: Semantic Textual Similarity using Lexical, Syntactic, and Semantic Information , 2016, *SEMEVAL.

[69]  Darnes Vilariño Ayala,et al.  Lexical-Syntactic and Graph-Based Features for Authorship Verification Notebook for PAN at CLEF 2013 , 2013, CLEF.

[70]  Anne E. James,et al.  Intrinsic Plagiarism Detection Using Latent Semantic Indexing and Stylometry , 2013, 2013 Sixth International Conference on Developments in eSystems Engineering.

[71]  Malvina Nissim,et al.  GLAD: Groningen Lightweight Authorship Detection Notebook for PAN at CLEF 2015 , 2015 .

[72]  Hongfang Liu,et al.  MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on Lexical Semantic Net and Deep Learning Semantic Model , 2016, SemEval@NAACL-HLT.

[73]  Martin Steinebach,et al.  Authorship Verification via k-Nearest Neighbor Estimation Notebook for PAN at CLEF 2013 , 2013, CLEF.

[74]  Michal Brandejs,et al.  Improving Synoptic Querying for Source Retrieval , 2015 .

[75]  Günther Specht,et al.  Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents , 2013, NLDB.

[76]  Dinh Dien,et al.  Vietnamese plagiarism detection method , 2016, SoICT.

[77]  Parth Gupta,et al.  Cross-Language Plagiarism Detection Using a Multilingual Semantic Network , 2013, ECIR.

[78]  Deepa Gupta,et al.  Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System , 2016 .

[79]  K. Holyoak,et al.  The Cambridge handbook of thinking and reasoning , 2005 .

[80]  Iván V. Meza,et al.  A Single Author Style Representation for the Author Verification Task , 2014, CLEF.

[81]  Shachar Seidman,et al.  Authorship Verification Using the Impostors Method Notebook for PAN at CLEF 2013 , 2013, CLEF.

[82]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition , 2015, *SEMEVAL.

[83]  Didier Schwab,et al.  Deep Investigation of Cross-Language Plagiarism Detection Methods , 2017, BUCC@ACL.

[84]  Ayu Purwarianti,et al.  Detailed Analysis of Extrinsic Plagiarism Detection System Using Machine Learning Approach (Naive Bayes and SVM) , 2014 .

[85]  O. Haggag,et al.  Plagiarism Candidate Retrieval Using Selective Query Formulation and Discriminative Query Scoring Notebook for PAN at CLEF 2013 , 2013, CLEF.

[86]  Eshetie Berhan,et al.  Text Similarity Based on Data Compression in Arabic , 2014 .

[87]  Naomie Salim,et al.  An improved semantic plagiarism detection scheme based on Chi-squared automatic interaction detection , 2013, 2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE).

[88]  Yaakov HaCohen-Kerner,et al.  Rapid detection of similar peer-reviewed scientific papers via constant number of randomized fingerprints , 2017, Inf. Process. Manag..

[89]  Victor I. Chang,et al.  An integrated approach for intrinsic plagiarism detection , 2017, Future Gener. Comput. Syst..

[90]  Man Yan Miranda Chong,et al.  A study on plagiarism detection and plagiarism direction identification using natural language processing techniques , 2013 .

[91]  Reda Mohamed Hamou,et al.  Machine learning tool and meta-heuristic based on genetic algorithms for plagiarism detection over mail service , 2014, 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS).

[92]  Guy Paré,et al.  Synthesizing information systems knowledge: A typology of literature reviews , 2015, Inf. Manag..

[93]  Patrik Hrkut,et al.  Current Trends in Source Code Analysis, Plagiarism Detection and Issues of Analysis Big Datasets , 2017 .

[94]  Jöran Beel,et al.  Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag , 2011, JCDL '11.

[95]  Pushpak Bhattacharyya,et al.  CFILT-CORE: Semantic Textual Similarity using Universal Networking Language , 2013, *SEM@NAACL-HLT.

[96]  Erkay Savas,et al.  Efficient top-k similarity document search utilizing distributed file systems and cosine similarity , 2015, Cluster Computing.

[97]  Douglas Bagnall,et al.  Authorship Clustering using Multi-headed Recurrent Neural Networks , 2016, CLEF.

[98]  Timothy W. Finin,et al.  Ebiquity: Paraphrase and Semantic Similarity in Twitter using Skipgrams , 2015, *SEMEVAL.

[99]  Diego Antonio Rodríguez Torrejón,et al.  CoReMo 2.3 Plagiarism Detector Text Alignment Module - Notebook for PAN at CLEF 2014 , 2014, CLEF.

[100]  Anne E. James,et al.  An Integrated Machine Learning Approach for Extrinsic Plagiarism Detection , 2016, 2016 9th International Conference on Developments in eSystems Engineering (DeSE).

[101]  Demetrios G. Glinos A Hybrid Architecture for Plagiarism Detection Notebook for PAN at CLEF 2014 , 2014 .

[102]  Mark Stevenson,et al.  An IR-Based Approach Utilizing Query Expansion for Plagiarism Detection in MEDLINE , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[103]  Naomie Salim,et al.  Existing plagiarism detection techniques: A systematic mapping of the scholarly literature , 2015, Online Inf. Rev..

[104]  J Beall,et al.  Best practices for scholarly authors in the age of predatory journals. , 2016, Annals of the Royal College of Surgeons of England.

[105]  Ashraf Saad Hussein A Plagiarism Detection System for Arabic Documents , 2014, IEEE Conf. on Intelligent Systems.

[106]  Michiel van Dam A Basic Character N-gram Approach to Authorship Verification Notebook for PAN at CLEF 2013 , 2013, CLEF.

[107]  Moshe Koppel,et al.  Determining if two documents are written by the same author , 2014, J. Assoc. Inf. Sci. Technol..

[108]  Rakian Shima,et al.  A PERSIAN FUZZY PLAGIARISM DETECTION APPROACH , 2015 .

[109]  Paolo Rosso,et al.  Counting Co-occurrences in Citations to Identify Plagiarised Text Fragments , 2013, CLEF.

[110]  Michel Simard,et al.  CNRC at SemEval-2016 Task 1: Experiments in Crosslingual Semantic Textual Similarity , 2016, SemEval@NAACL-HLT.

[111]  Dhruba Kumar Bhattacharyya,et al.  Plagiarism: Taxonomy, Tools and Detection Techniques , 2018, ArXiv.

[112]  Ashraf S. Hussein Arabic document similarity analysis using n-grams and singular value decomposition , 2015, 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS).

[113]  Laurent Besacier,et al.  Using Word Embedding for Cross-Language Plagiarism Detection , 2017, EACL.

[114]  Douglas Bagnall,et al.  Author Identification Using Multi-headed Recurrent Neural Networks , 2015, CLEF.

[115]  Paolo Rosso,et al.  Determining and characterizing the reused text for plagiarism detection , 2013, Expert Syst. Appl..

[116]  Kim-Kwang Raymond Choo,et al.  Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles , 2016, J. Netw. Comput. Appl..

[117]  Rasim M. Alguliyev,et al.  PDLK: Plagiarism detection using linguistic knowledge , 2015, Expert Syst. Appl..

[118]  Parth Gupta,et al.  Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing , 2013, PROMISE Winter School.

[120]  Anuj Saini,et al.  Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language using Machine Learning , 2016, FIRE.

[121]  Kayvan Bijari,et al.  A Deep Learning Approach to Persian Plagiarism Detection , 2016, FIRE.

[122]  Laurent Besacier,et al.  CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity , 2017, *SEMEVAL.

[123]  Mark Stevenson,et al.  Plagiarism Detection in Texts Obfuscated with Homoglyphs , 2017, ECIR.

[124]  Juan D. Velásquez,et al.  Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style , 2013, Expert Syst. Appl..

[125]  Grigori Sidorov,et al.  A Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014 , 2014, CLEF.

[126]  Tuomo Kakkonen,et al.  Automatic Student Plagiarism Detection: Future Perspectives , 2010 .

[127]  Paolo Rosso,et al.  Semantically-informed distance and similarity measures for paraphrase plagiarism identification , 2018, J. Intell. Fuzzy Syst..

[128]  Chris Callison-Burch,et al.  A Lightweight and High Performance Monolingual Word Aligner , 2013, ACL.

[129]  T. Dharani,et al.  A survey on content based image retrieval , 2013, 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering.

[130]  Sumam Mary Idicula,et al.  Fingerprinting based detection system for identifying plagiarism in Malayalam text documents , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).

[131]  Benno Stein,et al.  Overview of PAN'17 - Author Identification, Author Profiling, and Author Obfuscation , 2017, CLEF.

[132]  Günther Specht,et al.  Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors , 2013, BTW.

[133]  Andreas Nürnberger,et al.  Web-based Demonstration of Semantic Similarity Detection Using Citation Pattern Visualization for a Cross Language Plagiarism Case , 2014, ICEIS.

[134]  Martyna Spiewak,et al.  OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection , 2017, CLEF.

[135]  Bela Gipp Citation-based Plagiarism Detection - Detecting Disguised and Cross-language Plagiarism using Citation Pattern Analysis , 2014 .

[136]  Farzin Yaghmaee,et al.  Automatic external Persian plagiarism detection using vector space model , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[137]  Deepa Gupta,et al.  Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[138]  Salha M. Alzahrani,et al.  Arabic Plagiarism Detection Using Word Correlation in N-Grams with K-Overlapping Approach, Working Notes for PAN-AraPlagDet at FIRE 2015 , 2015, FIRE Workshops.

[139]  Jacques Savoy,et al.  UniNE at CLEF 2017: Author Clustering , 2017, CLEF.

[140]  Grigori Sidorov,et al.  A Graph Based Authorship Identification Approach: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[141]  Rune Borge Kalleberg Towards Detecting Textual Plagiarism Using Machine Learning Methods , 2015 .

[142]  Paolo Rosso,et al.  Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016 , 2016, FIRE.

[143]  Philip M. Newton,et al.  How Common Is Commercial Contract Cheating in Higher Education and Is It Increasing? A Systematic Review , 2018, Front. Educ..

[144]  Paolo Rosso,et al.  A New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection , 2013, CLEF.

[145]  Roshan G. Ragel,et al.  AntiPlag: Plagiarism detection on electronic submissions of text based assignments , 2013, 2013 IEEE 8th International Conference on Industrial and Information Systems.

[146]  Jacek Kitowski,et al.  Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection , 2014, ICAISC.

[147]  Jean-Gabriel Ganascia,et al.  Automatic detection of reuses and citations in literary texts , 2014, Lit. Linguistic Comput..

[148]  Jing Zhang,et al.  Source Retrieval and Text Alignment Corpus Construction for Plagiarism Detection , 2015, CLEF.

[149]  Bill Keller,et al.  Twitter Paraphrase Identification with Simple Overlap Features and SVMs , 2015, *SEMEVAL.

[150]  Parth Gupta,et al.  Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language , 2016, Knowl. Based Syst..

[151]  Mihai Datcu,et al.  Authorship analysis based on data compression , 2014, Pattern Recognit. Lett..

[152]  Rafael-Michael Karampatsis,et al.  CDTDS: Predicting Paraphrases in Twitter via Support Vector Regression , 2015, *SEMEVAL.

[153]  Pashutan Modaresi,et al.  A Language Independent Author Verifier Using Fuzzy C-Means Clustering , 2014, CLEF.

[154]  C. Lee Giles,et al.  Supervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2014 , 2014 .

[155]  Juan D. Velásquez,et al.  Docode 5: Building a real-world plagiarism detection system , 2017, Eng. Appl. Artif. Intell..

[156]  Shuai Wang,et al.  Combination of VSM and Jaccard coefficient for external plagiarism detection , 2013, 2013 International Conference on Machine Learning and Cybernetics.

[157]  Kayvan Bijari,et al.  Graph-based Approach to Text Alignment for Plagiarism Detection in Persian Documents , 2016, FIRE.

[158]  Mark Stevenson,et al.  A Machine Learning-based Intrinsic Method for Cross-topic and Cross-genre Authorship Verification , 2015, CLEF.

[159]  Magdalena Jankowska,et al.  Proximity Based One-class Classification with Common N-Gram Dissimilarity for Authorship Verification Task Notebook for PAN at CLEF 2013 , 2013, CLEF.

[160]  Heri Ramampiaro,et al.  A Deep Network Model for Paraphrase Detection in Short Text Messages , 2017, Inf. Process. Manag..

[161]  Iván V. Meza,et al.  Distance Learning for Author Verification Notebook for PAN at CLEF 2013 , 2013, CLEF.

[162]  Graeme Hirst,et al.  Authorship Verification with Entity Coherence and Other Rich Linguistic Features Notebook for PAN at CLEF 2013 , 2013, CLEF.

[163]  Darnes Vilariño Ayala,et al.  Author Clustering using Hierarchical Clustering Analysis , 2017, CLEF.

[164]  Victoria Elizalde Using Statistic and Semantic Analysis to Detect Plagiarism Notebook for PAN at CLEF 2013 , 2013, CLEF.

[165]  Agung Toto Wibowo,et al.  Comparison between fingerprint and winnowing algorithm to detect plagiarism fraud on Bahasa Indonesia documents , 2013, 2013 International Conference of Information and Communication Technology (ICoICT).

[166]  Mona T. Diab,et al.  GWU NLP at SemEval-2016 Shared Task 1: Matrix Factorization for Crosslingual STS , 2016, *SEMEVAL.

[167]  Roman Kern Grammar Checker Features for Author Identification and Author Profiling Notebook for PAN at CLEF 2013 , 2013, CLEF.

[168]  Faramarz Safi Esfahani,et al.  A Plagiarism Detection Approach Based on SVM for Persian Texts , 2016, FIRE.

[169]  Serkan Günal,et al.  Text classification using genetic algorithm oriented latent semantic features , 2014, Expert Syst. Appl..

[170]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[171]  Azadeh Shakery,et al.  Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information , 2016, Inf. Process. Manag..

[172]  Calton Pu,et al.  Fast Text Classification Using Randomized Explicit Semantic Analysis , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[173]  Kinam Park,et al.  CopyCaptor : Plagiarized Source Retrieval System using Global Word Frequency and Local Feedback Notebook for PAN at CLEF 2013 , 2013, CLEF.

[174]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[175]  Martin Steinebach,et al.  VEBAV - A Simple, Scalable and Fast Authorship Verification Scheme , 2014, CLEF.

[176]  Preslav Nakov,et al.  Experiments in Authorship-Link Ranking and Complete Author Clustering , 2016, CLEF.

[177]  Moritz Schubotz,et al.  Analyzing Mathematical Content to Detect Academic Plagiarism , 2017, CIKM.

[178]  F. White,et al.  A 5‐year systematic strategy to reduce plagiarism among first‐year psychology university students , 2013 .

[179]  Dipankar Das,et al.  Authorship Verification: An Approach based on Random Forest: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[180]  Efstathios Stamatatos,et al.  Overview of the Author Identification Task at PAN 2013 , 2013, CLEF.

[181]  Paul A. Watters,et al.  Local n-grams for Author Identification Notebook for PAN at CLEF 2013 , 2013, CLEF.

[182]  Lee Gillam,et al.  Guess Again and See if They Line up: Surrey's Runs at Plagiarism Detection Notebook for PAN at CLEF 2013 , 2013, CLEF.

[183]  Sophia Ananiadou,et al.  NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features , 2016, *SEMEVAL.

[184]  Ali Selamat,et al.  Plagiarism Detection through Internet using Hybrid Artificial Neural Network and Support Vectors Machine , 2014 .

[185]  Xiaozhong Liu,et al.  Semantic Annotation with RescoredESA: Rescoring Concept Features Generated From Explicit Semantic Analysis , 2014, ESAIR '14.

[186]  Mounir Errami,et al.  Responding to Possible Plagiarism , 2009, Science.

[187]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[188]  Benno Stein,et al.  Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering , 2017, CLEF.

[189]  Roshan G. Ragel,et al.  Plagiarism detection on electronic text based assignments using vector space model , 2014, 7th International Conference on Information and Automation for Sustainability.

[190]  Mahmoud Al-Ayyoub,et al.  Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features , 2017, Inf. Process. Manag..

[191]  Oren Halvani,et al.  Author Clustering based on Compression-based Dissimilarity Scores , 2017, CLEF.

[192]  Ergun Biçici,et al.  RTM at SemEval-2016 Task 1: Predicting Semantic Similarity with Referential Translation Machines and Related Statistics , 2016, *SEMEVAL.

[193]  Niraj Kumar A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing , 2014, CICLing.

[194]  Alberto Barrón-Cedeño,et al.  Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection , 2013, CL.

[195]  Rao Muhammad Adeel Nawab,et al.  Author Diarization Using Cluster-Distance Approach , 2016, CLEF.

[196]  Hung-Hsuan Chen,et al.  Classifying and ranking search engine results as potential sources of plagiarism , 2014, DocEng '14.

[197]  Man Lan,et al.  ECNU: Leveraging Word Embeddings to Boost Performance for Paraphrase in Twitter , 2015, *SEMEVAL.

[198]  Piotr Andruszkiewicz,et al.  Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. , 2016, *SEMEVAL.

[199]  Daniel A. Keim,et al.  An Adaptive Image-based Plagiarism Detection Approach , 2018, JCDL.

[200]  Yurii Palkovskii,et al.  Developing High-Resolution Universal Multi- Type N-Gram Plagiarism Detector Notebook for PAN at CLEF 2014 , 2014 .

[201]  Hung-Hsuan Chen,et al.  Unsupervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2013 , 2013, CLEF.

[202]  Mohsen Rashwan,et al.  RDI System for Extrinsic Plagiarism Detection (RDI_RED), Working Notes for PANAraPlagDet at FIRE 2015 , 2015, FIRE Workshops.

[203]  Takeru Yokoi Sentence-Based Plagiarism Detection for Japanese Document Based on Common Nouns and Part-of-Speech Structure , 2014, SoMeT.

[204]  Simon Suchomel,et al.  Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013 , 2013, CLEF.

[205]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A systematic literature review , 2009, Inf. Softw. Technol..

[206]  John Walker,et al.  Student Plagiarism in Universities: What are we Doing About it? , 1998 .

[207]  Erik von Elm,et al.  Different patterns of duplicate publication: an analysis of articles used in systematic reviews. , 2004, JAMA.

[208]  Teddi Fishman “We know it when we see it” is not good enough: toward a standard definition of plagiarism that transcends theft, fraud, and copyright , 2009 .

[209]  Alireza Talebpour,et al.  Texts semantic similarity detection based graph approach , 2016, Int. Arab J. Inf. Technol..

[210]  Steffen Scholz,et al.  A concept for plagiarism detection based on compressed bitmaps , 2014, DBKDA 2014.

[211]  Deepa Gupta,et al.  Text plagiarism classification using syntax based linguistic features , 2017, Expert Syst. Appl..

[212]  Moritz Schubotz,et al.  Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[213]  Paolo Rosso,et al.  Our Method , 1867, Hall's journal of health.

[214]  Yong Han,et al.  Source Retrieval Based on Learning to Rank and Text Alignment Based on Plagiarism Type Recognition for Plagiarism Detection , 2014, CLEF.

[215]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[216]  Thamar Solorio,et al.  Using a Variety of n-Grams for the Detection of Different Kinds of Plagiarism Notebook for PAN at CLEF 2013 , 2013, CLEF.

[217]  Daniel L. Fay,et al.  Research collaboration in universities and academic entrepreneurship: the-state-of-the-art , 2012, The Journal of Technology Transfer.

[218]  Zdenek Ceska,et al.  Plagiarism Detection Based on Singular Value Decomposition , 2008, GoTAL.

[219]  Paolo Rosso,et al.  A resource-light method for cross-lingual semantic textual similarity , 2017, Knowl. Based Syst..

[220]  Timo Petmanson,et al.  Authorship Identification Using Correlations of Frequent Features Notebook for PAN at CLEF 2013 , 2013, CLEF.

[221]  Muazzam Ahmed Siddiqui,et al.  DEVELOPING AN ARABIC PLAGIARISM DETECTION CORPUS , 2014 .

[222]  Matthias Hagen,et al.  Source Retrieval for Plagiarism Detection from Large Web Corpora: Recent Approaches , 2015, CLEF.

[223]  Sarah H. Harvey Author Verification using PPM with Parts of Speech Tagging , 2014, CLEF.

[224]  Mahmood Ahmadi,et al.  An efficient and scalable plagiarism checking system using Bloom filters , 2014, Comput. Electr. Eng..

[225]  Eric Medvet,et al.  An Author Verification Approach Based on Differential Features: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[226]  Salar Mohtaj,et al.  Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[227]  Zhimao Lu,et al.  Detecting High Obfuscation Plagiarism: Exploring Multi-Features Fusion via Machine Learning , 2014 .

[228]  Daniel Castro-Castro,et al.  Discovering Author Groups using a B-compact graph-based Clustering , 2017, CLEF.

[229]  T. Foltýnek,et al.  Impact of Policies for Plagiarism in Higher Education Across Europe: Results of the Project , 2015 .

[230]  Azadeh Shakery,et al.  A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection , 2016, FIRE.

[231]  Felipe Bravo-Marquez,et al.  DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources , 2016, Inf. Fusion.

[232]  Victoria Elizalde Using Noun Phrases and Tf-idf for Plagiarized Document Retrieval , 2014, CLEF.

[233]  Ahmed Khorsi,et al.  2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents , 2018 .

[234]  Ilya Sochenkov,et al.  Using Sentence Similarity Measure for Plagiarism Source Retrieval , 2014, CLEF.

[235]  Ari Moesriami Barmawi,et al.  Non-relevant document reduction in anti-plagiarism using asymmetric similarity and AVL tree index , 2014, 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS).

[236]  Jody Condit Fagan,et al.  An evidence-based review of academic web search engines, 2014-2016: Implications for librarians’ practice and research agenda , 2017 .

[237]  Jamal Ahmad Khan Style Breach Detection: An Unsupervised Detection Model , 2017, CLEF.

[238]  Philipp Gross,et al.  Plagiarism Alignment Detection by Merging Context Seeds Notebook for PAN at CLEF 2014 , 2014 .

[239]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[240]  Alberto Barrón-Cedeño,et al.  Methods for cross-language plagiarism detection , 2013, Knowl. Based Syst..

[241]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[242]  Ali Daud,et al.  Urdu language processing: a survey , 2017, Artificial Intelligence Review.

[243]  Deepa Gupta,et al.  Efficient Paragraph based Chunking and Download Filtering for Plagiarism Source Retrieval , 2015, CLEF.

[244]  Benno Stein,et al.  Overview of the PAN/CLEF 2015 Evaluation Lab , 2015, CLEF.

[245]  George D. C. Cavalcanti,et al.  Combining sentence similarities measures to identify paraphrases , 2018, Comput. Speech Lang..

[246]  Syed Fawad Hussain,et al.  On retrieving intelligently plagiarized documents using semantic similarity , 2015, Eng. Appl. Artif. Intell..

[247]  Mihaela Juganaru-Mathieu,et al.  UJM at CLEF in Author Identification Notebook for PAN at CLEF 2014 , 2014, CLEF.

[248]  Paolo Rosso,et al.  Overview of the AraPlagDet PAN@FIRE2015 Shared Task on Arabic Plagiarism Detection , 2015, FIRE Workshops.

[249]  Christian Winter,et al.  A Generic Authorship Verification Scheme Based on Equal Error Rates , 2015, CLEF.

[250]  Vishal Goyal,et al.  Maulik: A Plagiarism Detection Tool for Hindi Documents , 2016 .

[251]  Man Lan,et al.  ECNU at SemEval-2016 Task 1: Leveraging Word Embedding From Macro and Micro Views to Boost Performance for Semantic Textual Similarity , 2016, SemEval@NAACL-HLT.

[252]  Davide Buscaldi,et al.  LIPN-CORE: Semantic Text Similarity using n-grams, WordNet, Syntactic Analysis, ESA and Information Retrieval based Features , 2013, *SEMEVAL.

[253]  Ariel Stolerman,et al.  Doppelgänger Finder: Taking Stylometry to the Underground , 2014, 2014 IEEE Symposium on Security and Privacy.

[254]  Pearl Brereton,et al.  Lessons from applying the systematic literature review process within the software engineering domain , 2007, J. Syst. Softw..

[255]  Jacques Savoy,et al.  UniNE at CLEF 2015 Author Identification: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[256]  Amit Prakash,et al.  Experiments on Document Chunking and Query Formation for Plagiarism Source Retrieval , 2014, CLEF.

[257]  Paolo Rosso,et al.  A systematic study of knowledge graph analysis for cross-language plagiarism detection , 2016, Inf. Process. Manag..

[258]  Habibollah Asghari,et al.  Source Retrieval Plagiarism Detection based on Noun Phrase and Keyword Phrase Extraction Notebook for PAN at CLEF 2015 , 2015 .

[259]  Kelwin Fernandes,et al.  Random Forest with Increased Generalization: A Universal Background Approach for Authorship Verification , 2015, CLEF.

[260]  Aoife Cahill,et al.  Can characters reveal your native language? A language-independent approach to native language identification , 2014, EMNLP.

[261]  David Pinto,et al.  Unsupervised method for the authorship identification task Notebook for PAN at CLEF 2014 , 2014 .

[262]  Roberto Navigli,et al.  Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity , 2013, ACL.

[263]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[264]  Hideo Itoh RICOH at SemEval-2016 Task 1: IR-based Semantic Textual Similarity Estimation , 2016, SemEval@NAACL-HLT.

[265]  Henk F. Moed,et al.  The application of bibliometric indicators: Important field- and time-dependent factors to be considered , 1985, Scientometrics.

[266]  Simon Suchomel,et al.  Heterogeneous Queries for Synoptic and Phrasal Search Notebook for PAN at CLEF 2014 , 2014 .

[267]  Deepa Gupta,et al.  Detection of idea plagiarism using syntax-Semantic concept extractions with genetic algorithm , 2017, Expert Syst. Appl..

[268]  Mark Stevenson,et al.  Exploring Word Embeddings and Character N-Grams for Author Clustering , 2016, CLEF.

[269]  Anand,et al.  A Statistical Analysis Approach to Author Identification Using Latent Semantic Analysis , 2014, CLEF.

[270]  Paolo Rosso,et al.  Comparing and combining Content‐ and Citation‐based approaches for plagiarism detection , 2016, J. Assoc. Inf. Sci. Technol..

[271]  El-Sayed M. El-Alfy,et al.  Boosting paraphrase detection through textual similarity metrics with abductive networks , 2015, Appl. Soft Comput..