Citation-based Plagiarism Detection

Plagiarism is a problem with far-reaching consequences for the sciences. However, even todays best software-based systems can only reliably identify copy & paste plagiarism. Disguised plagiarism forms, including paraphrased text, cross-language plagiarism, as well as structural and idea plagiarism often remain undetected. This weakness of current systems results in a large percentage of scientific plagiarism going undetected. Bela Gipp provides an overview of the state-of-the art in plagiarism detection and an analysis of why these approaches fail to detect disguised plagiarism forms. The author proposes Citation-based Plagiarism Detection to address this shortcoming. Unlike character-based approaches, this approach does not rely on text comparisons alone, but analyzes citation patterns within documents to form a language-independent "semantic fingerprint" for similarity assessment. The practicability of Citation-based Plagiarism Detection was proven by its capability to identify so-far non-machine detectable plagiarism in scientific publications.

[1]  P. Durani,et al.  Duplicate publications: redundancy in plastic surgery literature. , 2006, Journal of plastic, reconstructive & aesthetic surgery : JPRAS.

[2]  Yuen-Yan Chan,et al.  A natural language processing approach to automatic plagiarism detection , 2007, SIGITE '07.

[3]  M. Hamdaoui,et al.  DERMATOLOGY LIFE QUALITY INDEX SCORES IN VITILIGO: RELIABILITY AND VALIDITY OF THE TUNISIAN VERSION , 2009, Indian journal of dermatology.

[4]  Arkady B. Zaslavsky,et al.  Suffix Vector: Space- and Time-Efficient Alternative to Suffix Trees , 2002, ACSC.

[5]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[6]  Jöran Beel,et al.  Google Scholar’s Ranking Algorithm : An Introductory Overview , 2009 .

[7]  C. Lyon,et al.  Experiments in Electronic Plagiarism Detection Computer Science Department , TR 388 , 2008 .

[8]  Shanmugasundaram Hariharan,et al.  Detecting Plagiarism in Text Documents , 2010, BAIP.

[9]  Melissa S. Anderson,et al.  Scientists behaving badly , 2005, Nature.

[10]  D. Crown,et al.  Learning from the Literature on Collegiate Cheating: A Review of Empirical Research , 1998 .

[11]  N. Mohaghegh,et al.  WHY THE IMPACT FACTOR OF JOURNALS SHOULD NOT BE USED FOR EVALUATING RESEARCH , 2005 .

[12]  K. Trost Psst, have you ever cheated? A study of academic dishonesty in Sweden , 2009 .

[13]  Bella Hass Weinberg,et al.  Bibliographic coupling: A review , 1974, Inf. Storage Retr..

[14]  Bart De Moor,et al.  Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database , 2010, J. Assoc. Inf. Sci. Technol..

[15]  Elliott Levy,et al.  Academic Dishonesty: Perceptions of Business Students. , 2007 .

[16]  Emi Fujioka,et al.  Identifying Information Provenance in Support of Intelligence Analysis, Sharing, and Protection , 2006, ISI.

[17]  H. Markram,et al.  Human Neuroscience , 2022 .

[18]  Elaine Fetyko Page,et al.  An Empirical Research Study of the Efficacy of Two Plagiarism-Detection Applications , 2009 .

[19]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[20]  Michael B. Cohen The Best in CytoJournal: 2005 , 2006, CytoJournal.

[21]  Lutz Prechelt,et al.  JPlag: Finding plagiarisms among a set of programs , 2000 .

[22]  S. Epstein Extubation failure: an outcome to be avoided , 2004, Critical care.

[23]  Mounir Errami,et al.  Déjà vu - A study of duplicate citations in Medline , 2008, Bioinform..

[24]  Patrick M. Scanlon,et al.  Internet Plagiarism among College Students. , 2002 .

[25]  Mounir Errami,et al.  Responding to Possible Plagiarism , 2009, Science.

[26]  George Tsatsaronis Identifying free text plagiarism based on semantic similarity , 2010 .

[27]  J. Ioannidis,et al.  The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. , 2009, Journal of clinical epidemiology.

[28]  Christoforos Nikolaou,et al.  Structural constraints revealed in consistent nucleosome positions in the genome of S. cerevisiae , 2010, Epigenetics & Chromatin.

[29]  Kamel Khalili,et al.  Retrovirology BioMed Central Review HIV-1 associated dementia: symptoms and causes , 2006 .

[30]  G. Simonetti,et al.  A Rare Case of Popliteal Venous Aneurysm , 2010, Case reports in medicine.

[31]  A. Casadevall,et al.  Misconduct accounts for the majority of retracted scientific publications , 2012, Proceedings of the National Academy of Sciences.

[32]  H. Schumacher,et al.  Rheumatoid arthritis associated autoantibodies in patients with synovitis of recent onset , 2000, Arthritis research.

[33]  James T. Neill,et al.  Who cheats at university? A self-report study of dishonest academic behaviours in a sample of Australian university students , 2005 .

[34]  Erik Wilde,et al.  Academic Search Engine Optimization (ASEO) , 2010 .

[35]  J. Park,et al.  Revised STandards for Reporting Interventions in Clinical Trials of Acupuncture(STRICTA) : Extending the CONSORT Statement , 2010 .

[36]  Bob S. Brown,et al.  Explaining Variations in the Level of Academic Dishonesty in Studies of College Students: Some New Evidence , 2001 .

[37]  M. V. van Boekel,et al.  Autoantibody systems in rheumatoid arthritis: specificity, sensitivity and diagnostic value , 2001, Arthritis research.

[38]  H. Giele,et al.  Duplicate Publication in the Journal of Hand Surgery , 2004, Journal of hand surgery.

[39]  T. J. Phelan,et al.  A compendium of issues for citation analysis , 1999, Scientometrics.

[40]  Berthier A. Ribeiro-Neto,et al.  Combining link-based and content-based methods for web document classification , 2003, CIKM '03.

[41]  Roman Kern,et al.  External and Intrinsic Plagiarism Detection Using Vector Space Models , 2009 .

[42]  Bruno Pouliquen,et al.  Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC , 2002, CICLing.

[43]  Debora Weber-Wulff,et al.  Test cases for plagiarism detection software , 2010 .

[44]  Ran El-Yaniv,et al.  Optimal Single-Class Classification Strategies , 2006, NIPS.

[45]  Henry Markram,et al.  The Intense World Syndrome – an Alternative Hypothesis for Autism , 2007, Front. Neurosci..

[46]  M. Johnston,et al.  Statistical considerations in a systematic review of proxy measures of clinical behaviour , 2010, Implementation science : IS.

[47]  Bob S. Brown,et al.  The Academic Ethics of Undergraduate Marketing Majors , 1999 .

[48]  M. M. Kessler,et al.  An experimental study of bibliographic coupling between technical papers (Corresp.) , 1963, IEEE Trans. Inf. Theory.

[49]  L. Kidwell,et al.  Student Reports and Faculty Perceptions of Academic Dishonesty , 2003 .

[50]  Mark Stevenson,et al.  Developing a corpus of plagiarised short answers , 2011, Lang. Resour. Evaluation.

[51]  Berthier A. Ribeiro-Neto,et al.  Link Information as a Similarity Measure in Web Classification , 2003, SPIRE.

[52]  A. Kulkarni,et al.  Extubation failure in intensive care unit: Predictors and management , 2008, Indian journal of critical care medicine : peer-reviewed, official publication of Indian Society of Critical Care Medicine.

[53]  Bela Gipp,et al.  Academic Search Engine Spam and Google Scholar's Resilience Against it , 2010 .

[54]  Benno Stein,et al.  An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[55]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[56]  Eugene Garfield,et al.  THE USE OF CITATION DATA IN WRITING THE HISTORY OF SCIENCE , 1964 .

[57]  J. Ioannidis,et al.  The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration , 2009, BMJ : British Medical Journal.

[58]  Bruno Pouliquen,et al.  Automatic Identification of Document Translations in Large Multilingual Document Collections , 2006, ArXiv.

[59]  Alison Callahan,et al.  Contextual cocitation: Augmenting cocitation analysis and its applications , 2010, J. Assoc. Inf. Sci. Technol..

[60]  Andrew Milat,et al.  Computer-tailored physical activity behavior change interventions targeting adults: a systematic review , 2009, The international journal of behavioral nutrition and physical activity.

[61]  Benno Stein,et al.  Strategies for retrieving plagiarized documents , 2007, SIGIR.

[62]  A. Girbes,et al.  Management of invasive pulmonary aspergillosis in non-neutropenic critically ill patients , 2007, Intensive Care Medicine.

[63]  Erik von Elm,et al.  Different patterns of duplicate publication: an analysis of articles used in systematic reviews. , 2004, JAMA.

[64]  Teddi Fishman “We know it when we see it” is not good enough: toward a standard definition of plagiarism that transcends theft, fraud, and copyright , 2009 .

[65]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[66]  Bryan A. Garner,et al.  Garner's Dictionary of Legal Usage , 2011 .

[67]  James Heather Turnitoff: identifying and fixing a hole in current plagiarism detection software , 2010 .

[68]  Chris J. Park,et al.  In Other (People's) Words: Plagiarism by university students--literature and lessons , 2003 .

[69]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[70]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[71]  James A. Malcolm,et al.  Detecting Short Passages of Similar Text in Large Document Collections , 2001, EMNLP.

[72]  Alexander F. Gelbukh,et al.  PPChecker: Plagiarism Pattern Checker in Document Copy Detection , 2006, TSD.

[73]  Lori M. Ventura,et al.  Psychoneuroimmunology: application to ocular diseases , 2009, Journal of ocular biology, diseases, and informatics.

[74]  Ophir Frieder,et al.  Collection statistics for fast duplicate document detection , 2002, TOIS.

[75]  Kenneth D. Butterfield,et al.  Academic Dishonesty in Graduate Business Programs: Prevalence, Causes, and Proposed Action , 2006 .

[76]  Berthier A. Ribeiro-Neto,et al.  A comparative study of citations and links in document classification , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[77]  Norman Meuschke,et al.  CITREC : An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central , 2015 .

[78]  José Carlos González,et al.  A Plagiarism Detector for Intrinsic Plagiarism - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[79]  H. Moses,et al.  Induction by transforming growth factor-β1 of epithelial to mesenchymal transition is a rare event in vitro , 2004, Breast Cancer Research.

[80]  William F. Smyth,et al.  Computing Patterns in Strings , 2003 .

[81]  Norman Meuschke,et al.  Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence , 2011, DocEng '11.

[82]  Masaru Kitsuregawa,et al.  Evaluating contents-link coupled web page clustering for web search results , 2002, CIKM '02.

[83]  J. Snapper On the Web, plagiarism matters more than copyright piracy , 1998, Ethics and Information Technology.

[84]  Hermann A. Maurer,et al.  Plagiarism - A Problem And How To Fight It , 2007 .

[85]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[86]  Rosy Jan,et al.  Citation analysis of Library Trends , 2009, Webology.

[87]  C. Mitchell Dayton,et al.  Improved estimation of academic cheating behavior using the randomized response technique , 1987 .

[88]  Norman Meuschke,et al.  State-of-the-art in detecting academic plagiarism , 2013 .

[89]  R Grant Steen,et al.  Retractions in the medical literature: how many patients are put at risk by flawed research? , 2011, Journal of Medical Ethics.

[90]  Kevin W. Boyack,et al.  Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches , 2011, PloS one.

[91]  Vivien K. G. Lim,et al.  Attitudes Toward, and Intentions to Report, Academic Cheating Among Students in Singapore , 2001 .

[92]  S. Rajesh,et al.  Successful microsurgical penile replantation following self amputation in a schizophrenic patient , 2010, Indian journal of urology : IJU : journal of the Urological Society of India.

[93]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[94]  Birger Larsen,et al.  References and citations in automatic indexing and retrieval systems - experiments with the boomerang effect , 2004 .

[95]  Yang Shen,et al.  Research on Anti-Plagiarism System and the Law of Plagiarism , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[96]  Jöran Beel,et al.  Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag , 2011, JCDL '11.

[97]  Debora Weber-Wulff,et al.  Strategien der Plagiatsbekämpfung , 2006 .

[98]  Per Ahlgren,et al.  Document-document similarity approaches and science mapping: Experimental comparison of five approaches , 2009, J. Informetrics.

[99]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[100]  Rudolf Ravas,et al.  Improved Implementation for Finding Text Similarities in Large Sets of Data - Notebook for PAN at CLEF 2011 , 2011, CLEF.

[101]  L. Treviño,et al.  Academic Dishonesty: Honor Codes and Other Contextual Influences , 1993 .

[102]  Zhang Ling,et al.  A Cluster-Based Plagiarism Detection Method - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[103]  Jan Kasprzak,et al.  Improving the Reliability of the Plagiarism Detection System - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[104]  U. Hahn,et al.  Automatically Adapting an NLP Core Engine to the Biology Domain , 2006 .

[105]  Tovi Grossman,et al.  Citeology: visualizing paper genealogy , 2012, CHI EA '12.

[106]  D. Wofsy,et al.  Plagiarism on Personal Statements: A Disturbing Symptom of a Broader Trend , 2010, Annals of Internal Medicine.

[107]  James A. Malcolm,et al.  Embedding plagiarism education in the assessment process , 2006 .

[108]  Zorana Ercegovac,et al.  Academic Dishonesty, Plagiarism Included, in the Digital Age: A Literature Review , 2004 .

[109]  B. Whitley,et al.  FACTORS ASSOCIATED WITH CHEATING AMONG COLLEGE STUDENTS: A Review , 1998 .

[110]  Arkady B. Zaslavsky,et al.  Efficiency of data structures for detecting overlaps in digital documents , 2001, Proceedings 24th Australian Computer Science Conference. ACSC 2001.

[111]  Michael Krauthammer,et al.  Enriching PubMed Related Article Search with Sentence Level Co-citations , 2009, AMIA.

[112]  Introduction of article-processing charges for Population Health Metrics , 2003, Population health metrics.

[113]  Nivio Ziviani,et al.  Link-based similarity measures for the classification of Web documents , 2006 .

[114]  Erik Wilde,et al.  Introducing Mr. DLib, a Machine-readable Digital Library , 2011, JCDL '11.

[115]  Yalin Chen,et al.  RETRACTED: Simple mental arithmetic is not so simple: An ERP study of the split and odd–even effects in mental arithmetic , 2012, Neuroscience Letters.

[116]  Martin Andreas Gutbrod Nachhaltiges E-Learning durch sekundäre Dienste , 2007 .

[117]  Patty Roberts,et al.  Academic Misconduct: Where Do We Start?. , 1997 .

[118]  H. Verhagen,et al.  Plagiarism awareness, perception, and attitudes among students and teachers in Swedish higher education – a case study , 2010 .

[119]  R. Grose Common ground in the transcriptional profiles of wounds and tumors , 2004, Genome Biology.

[120]  S. Stoeckli,et al.  Joint practice guidelines for radionuclide lymphoscintigraphy for sentinel node localization in oral/oropharyngeal squamous cell carcinoma , 2009, European Journal of Nuclear Medicine and Molecular Imaging.

[121]  Mounir Errami,et al.  Déjà vu: a database of highly similar citations in the scientific literature , 2008, Nucleic Acids Res..

[122]  Norman Meuschke,et al.  CitePlag : A Citation-based Plagiarism Detection System Prototype , 2012 .

[123]  Maria Soledad Pera,et al.  SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents , 2011, Web Intell. Agent Syst..

[124]  Benno Stein,et al.  Near Similarity Search and Plagiarism Analysis , 2005, GfKl.

[125]  Efstathios Stamatatos,et al.  Intrinsic Plagiarism Detection Using Character n-gram Profiles , 2009 .

[126]  Z. Tan Neural protection by naturopathic compounds—an example of tetramethylpyrazine from retina to brain , 2009, Journal of ocular biology, diseases, and informatics.

[127]  Fernando Llopis,et al.  A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[128]  J. E. Scott The Pulmonary Surfactant: Impact of Tobacco Smoke and Related Compounds on Surfactant and Lung Development , 2004, Tobacco Induced Diseases.

[129]  Krisztián Monostori,et al.  Using the MatchDetectReveal System for Comparative Analysis of Texts , 2001 .

[130]  T. Bretag,et al.  Self-Plagiarism or Appropriate Textual Re-use? , 2009 .

[131]  Alberto Barrón-Cedeño,et al.  On Automatic Plagiarism Detection Based on n-Grams Comparison , 2009, ECIR.

[132]  Efstathios Stamatatos,et al.  Plagiarism detection using stopword n-grams , 2011, J. Assoc. Inf. Sci. Technol..

[133]  Benno Stein,et al.  A Wikipedia-Based Multilingual Retrieval Model , 2008, ECIR.

[134]  S. Stegemann-Boehl Fehlverhalten von Forschern : eine Untersuchung am Beispiel der biomedizinischen Forschung im Rechtsvergleich USA-Deutschland , 1994 .

[135]  Eugene L. Lawler,et al.  Sublinear approximate string matching and biological applications , 1994, Algorithmica.

[136]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[137]  W. Glänzel BIBLIOMETRICS AS A RESEARCH FIELD A course on theory and application of bibliometric indicators , 2003 .

[138]  Jan Kasprzak,et al.  Finding Plagiarism by Evaluating Document Similarities , 2009 .

[139]  Vinod B. Shidham,et al.  CytoJournal's move to the new platform: More on financial model to the support open-access charter in cytopathology, publication quality indicators, and other issues , 2008, CytoJournal.

[140]  Linda Klebe Trevino,et al.  What We Know About Cheating In College Longitudinal Trends and Recent Developments , 1996 .

[141]  Zdenek Ceska,et al.  Plagiarism Detection Based on Singular Value Decomposition , 2008, GoTAL.

[142]  Jöran Beel,et al.  Citation based plagiarism detection: a new approach to identify plagiarized work language independently , 2010, HT '10.

[143]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[144]  Kevin W. Boyack,et al.  Improving the accuracy of co-citation clustering using full text , 2013, J. Assoc. Inf. Sci. Technol..

[145]  Bart De Moor,et al.  Hybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets , 2009, SDM.

[146]  Plergiorgio Strata,et al.  Citation analysis , 1995, Nature.

[147]  B. C. Griffith,et al.  The Structure of Scientific Literatures II: Toward a Macro- and Microstructure for Science , 1974 .

[148]  Joseph Rudman,et al.  The State of Authorship Attribution Studies: Some Problems and Solutions , 1997, Comput. Humanit..

[149]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[150]  Sunju Park,et al.  A link-based similarity measure for scientific literature , 2010, WWW '10.

[151]  J. Ioannidis,et al.  Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE statement , 2009, European Journal of Epidemiology.

[152]  Stefan Gruner,et al.  Tool support for plagiarism detection in text documents , 2005, SAC '05.

[153]  Karl-Theodor Frhr. zu Guttenberg,et al.  Verfassung und Verfassungsvertrag : konstitutionelle Entwicklungsstufen in den USA und der EU , 2009 .

[154]  Máté Pataki,et al.  Comparison of Overlap Detection Techniques , 2002, International Conference on Computational Science.

[155]  Masaki Eto,et al.  Evaluations of context-based co-citation searching , 2012, Scientometrics.

[156]  Hayato Yamana,et al.  EPCI: extracting potentially copyright infringement texts from the web , 2007, WWW '07.

[157]  Irena V. Marshakova-shaikevich System of Document Connections Based on References , 2009 .

[158]  Benno Stein,et al.  Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[159]  M. Aller,et al.  Theoretical Biology and Medical Modelling Open Access Inflammation: a Way to Understanding the Evolution of Portal Hypertension , 2022 .

[160]  Roger Clarke,et al.  Plagiarism by Academics: More Complex Than It Seems , 2006, J. Assoc. Inf. Syst..

[161]  Fintan Culwin,et al.  An active introduction to academic misconduct and the measured demographics of misconduct , 2006 .

[162]  E Garfield,et al.  "Science Citation Index"--A New Dimension in Indexing. , 1964, Science.

[163]  M. Aller,et al.  The mast cell integrates the splanchnic and systemic inflammatory response in portal hypertension , 2007, Journal of Translational Medicine.

[164]  Justin Zobel,et al.  A Scalable System for Identifying Co-derivative Documents , 2004, SPIRE.

[165]  Naoki Shibata,et al.  Comparative study on methods of detecting research fronts using different types of citation , 2009, J. Assoc. Inf. Sci. Technol..

[166]  L. Wakefield,et al.  Transforming growth factors-β are not good biomarkers of chemopreventive efficacy in a preclinical breast cancer model system , 2000, Breast Cancer Research.

[167]  Jöran Beel,et al.  Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis , 2009 .

[168]  Shlomo Argamon,et al.  Authorship attribution in the wild , 2010, Lang. Resour. Evaluation.

[169]  R. Merton The Matthew Effect in Science , 1968, Science.

[170]  Donald L. Mccabe Cheating among college and university students: A North American perspective , 2005 .

[171]  Melissa S. Anderson,et al.  Ethical Problems in Academic Research , 1993 .

[172]  J. Vincent,et al.  Management of bleeding following major trauma: an updated European guideline , 2010, Critical care.

[173]  James Lewis,et al.  Data and text mining Text similarity : an alternative way to search MEDLINE , 2006 .

[174]  Jöran Beel,et al.  Link Proximity Analysis - Clustering Websites by Examining Link Proximity , 2010, ECDL.

[175]  Heinz Dreher,et al.  Issues in Informing Science and Information Technology Automatic Conceptual Analysis for Plagiarism Detection , 2022 .

[176]  B. C. Griffith,et al.  The Structure of Scientific Literatures I: Identifying and Graphing Specialties , 1974 .

[177]  Byung-Ryul Ahn,et al.  Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[178]  Cristian Grozea,et al.  Encoplot - Performance in the Second International Plagiarism Detection Challenge - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[179]  Leo Egghe,et al.  Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science , 1990 .

[180]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[181]  Deepak Vohra Pro XML Development with Java Technology , 2006 .

[182]  Ellen M. Voorhees,et al.  Bias and the limits of pooling for large collections , 2007, Information Retrieval.

[183]  Rashid Ali,et al.  An overview of Web search evaluation methods , 2011, Comput. Electr. Eng..

[184]  Hector Garcia-Molina,et al.  SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.

[185]  Jöran Beel,et al.  Google Scholar's ranking algorithm: The impact of citation counts (An empirical study) , 2009, 2009 Third International Conference on Research Challenges in Information Science.

[186]  Alberto Barrón-Cedeño,et al.  On Cross-lingual Plagiarism Analysis using a Statistical Model , 2008, PAN.

[187]  Peter C. R. Lane,et al.  Comparing Different Text Similarity Methods , 2007 .

[188]  Sami Surakka,et al.  Plaggie: GNU-licensed source code plagiarism detection engine for Java exercises , 2006, Baltic Sea '06.

[189]  Christian S. Collberg,et al.  Self-plagiarism in computer science , 2005, CACM.

[190]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[191]  Alberto Barrón-Cedeño,et al.  A statistical approach to crosslingual natural language tasks , 2008, LA-NMR.

[192]  Cristian Grozea,et al.  ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection ∗ , 2009 .

[193]  Sebastián A. Ríos,et al.  FastDocode: Finding Approximated Segments of N-Grams for Document Copy Detection - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[194]  Anthony F. J. van Raan,et al.  Bibliometric cartography of scientific and technological developments of an R & D field , 1994, Scientometrics.

[195]  Hector Garcia-Molina,et al.  Building a scalable and accurate copy detection mechanism , 1996, DL '96.

[196]  Henk F. Moed,et al.  Mapping of science by combined co-citation and word analysis, I. Structural aspects , 1991, J. Am. Soc. Inf. Sci..

[197]  B. Baker On Finding Duplication in Strings and Software , 1993 .

[198]  Jöran Beel,et al.  Evaluation of header metadata extraction approaches and tools for scientific PDF documents , 2013, JCDL '13.

[199]  Hector Garcia-Molina,et al.  Copy detection mechanisms for digital documents , 1995, SIGMOD '95.

[200]  David Sharp,et al.  Technical Review of Plagiarism Detection Software Report , 2001 .

[201]  Iraklis Varlamis,et al.  Text Relatedness Based on a Word Thesaurus , 2010, J. Artif. Intell. Res..

[202]  L. Neville,et al.  Computer-tailored dietary behaviour change interventions: a systematic review , 2009, Health education research.

[203]  William John Teahan,et al.  A repetition based measure for verification of text collections and for text categorization , 2003, SIGIR.

[204]  L. Lossi,et al.  Ghrelin in Central Neurons , 2009, Current neuropharmacology.

[205]  Bart De Moor,et al.  Hybrid clustering for validation and improvement of subject-classification schemes , 2009, Inf. Process. Manag..

[206]  T. Brooks Evidence of complex citer motivations , 1986, J. Am. Soc. Inf. Sci..

[207]  J. Lieberman,et al.  Tumor metastasis to bone , 2007, Arthritis research & therapy.

[208]  Jöran Beel,et al.  Scienstein : A Research Paper Recommender System , 2009 .

[209]  Eugene Garfield,et al.  New factors in the evaluation of scientific literature through citation indexing , 1963 .

[210]  A. Leung,et al.  Primary Biliary Cirrhosis , 2007 .

[211]  R. Vassar,et al.  Molecular Neurodegeneration BioMed Central Review The Alzheimer's disease β-secretase enzyme, BACE1 , 2007 .

[212]  Arkady B. Zaslavsky,et al.  Document overlap detection system for distributed digital libraries , 2000, DL '00.

[213]  D. Scott,et al.  The influence of tobacco smoking on adhesion molecule profiles , 2002, Tobacco induced diseases.

[214]  Bela Gipp Identifying Related Work and Plagiarism by Citation Analysis , 2011, Bull. IEEE Tech. Comm. Digit. Libr..

[215]  Roman Kern,et al.  External and Intrinsic Plagiarism Detection Using a Cross-Lingual Retrieval and Segmentation System - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[216]  Erik Hetzner A simple method for citation metadata extraction using hidden markov models , 2008, JCDL '08.

[217]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[218]  Zuhair Bandar,et al.  A Comparative Study of Two Short Text Semantic Similarity Measures , 2008, KES-AMSTA.

[219]  Bart De Moor,et al.  Integration of textual content and link information for accurate clustering of science fields , 2006 .

[220]  Anthony Debons,et al.  Bibliographic coupling , 1972, J. Am. Soc. Inf. Sci..

[221]  David Moher,et al.  Revised STandards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA): Extending the CONSORT Statement , 2010, PLoS medicine.

[222]  Peter Willett,et al.  Estimating the recall performance of Web search engines , 1997 .

[223]  Tuomo Kakkonen,et al.  Hermetic and Web Plagiarism Detection Systems for Student Essays—An Evaluation of the State-of-the-Art , 2010 .

[224]  Benno Stein,et al.  Intrinsic plagiarism analysis , 2011, Lang. Resour. Evaluation.

[225]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[226]  Paul Clough,et al.  Plagiarism in natural and programming languages: an overview of current tools and technologies , 2000 .

[227]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[228]  Chaomei Chen,et al.  The proximity of co-citation , 2011, Scientometrics.

[229]  Per Ahlgren,et al.  Bibliographic coupling, common abstract stems and clustering: A comparison of two document-document similarity approaches in the context of science mapping , 2008, Scientometrics.

[230]  Emanuele Caglioti,et al.  A plagiarism detection procedure in three steps: Selection, matches and squares , 2009 .

[231]  Sebastián A. Ríos,et al.  Outlier-Based Approaches for Intrinsic and External Plagiarism Detection , 2011, KES.

[232]  G. Hernández,et al.  Effects of positive end-expiratory pressure on gastric mucosal perfusion in acute respiratory distress syndrome , 2004, Critical care.

[233]  Rebecca Moore Howard,et al.  Understanding “Internet plagiarism” , 2007 .

[234]  Matthias Hagen,et al.  Overview of the 1st international competition on plagiarism detection , 2009 .

[235]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[236]  Johannes Gehrke,et al.  Plagiarism Detection in arXiv , 2006, Sixth International Conference on Data Mining (ICDM'06).

[237]  Brian Martin Obstacles to academic integrity , 2007 .

[238]  Douglas M. Campbell,et al.  Copy detection systems for digital documents , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[239]  G. Fröhlich Plagiate und unethische Autorenschaften , 2006 .

[240]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[241]  J. Garnacho-Montero,et al.  A validated clinical approach for the management of aspergillosis in critically ill patients: ready, steady, go! , 2006, Critical care.

[242]  Benno Stein,et al.  Plagiarism analysis, authorship identification, and near-duplicate detection PAN'07 , 2007, SIGF.

[243]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[244]  Sobha Lalitha Devi,et al.  External Plagiarism Detection - Lab Report for PAN at CLEF 2010 , 2010, CLEF.

[245]  Boris Katz,et al.  Using Syntactic Information to Identify Plagiarism , 2005 .

[246]  Sunju Park,et al.  C-Rank: A link-based similarity measure for scientific literature databases , 2011, Inf. Sci..

[247]  Allen C. Browne,et al.  dTagger: A POS Tagger , 2006, AMIA.

[248]  Tara C. Long,et al.  Systematic Characterizations of Text Similarity in Full Text Biomedical Publications , 2010, PloS one.

[249]  Arkady B. Zaslavsky,et al.  Signature Extraction for Overlap Detection in Documents , 2002, ACSC.

[250]  G. Koren,et al.  Pharmacological Treatment for Pregnant Women who Smoke Cigarettes , 2003, Tobacco induced diseases.

[251]  Dararat Khampusaen Dealing with Plagiarism in the Digital Age , 2015 .

[252]  M. Johnston,et al.  Are there valid proxy measures of clinical behaviour? a systematic review , 2009, Implementation science : IS.

[253]  Sergey Butakov,et al.  Using Microsoft SQL Server platform for plagiarism detection , 2009 .

[254]  Nivio Ziviani,et al.  Retrieving Similar Documents from the Web , 2003, J. Web Eng..

[255]  Norman Meuschke,et al.  Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus , 2014, J. Assoc. Inf. Sci. Technol..

[256]  R. Vassar,et al.  The Basic Biology of BACE1: A Key Therapeutic Target for Alzheimer’s Disease , 2007, Current genomics.

[257]  S. Leucht,et al.  How to read and understand and use systematic reviews and meta‐analyses , 2009, Acta psychiatrica Scandinavica.

[258]  Benno Stein,et al.  Plagiarism Detection Without Reference Collections , 2006, GfKl.

[259]  J. Veldhuis,et al.  Integrating GHS into the Ghrelin System , 2010, International journal of peptides.

[260]  Rynson W. H. Lau,et al.  CHECK: a document plagiarism detection system , 1997, SAC '97.

[261]  Chaomei Chen,et al.  The Effects of Co-citation Proximity on Co-citation Analysis , 2011 .

[262]  Sergey Butakov,et al.  The toolbox for local and global plagiarism detection , 2009, Comput. Educ..

[263]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[264]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[265]  Bo Jarneving,et al.  A comparison of two bibliometric methods for mapping of the research front , 2005, Scientometrics.

[266]  Wendy Sutherland-Smith,et al.  Pandora's box: academic perceptions of student plagiarism in writing , 2005 .

[267]  Jöran Beel,et al.  SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size) , 2010, ECDL.

[268]  Silvana Koch-Mehrin Historische Währungsunion zwischen Wirtschaft und Politik , 2001 .

[269]  Esko Ukkonen,et al.  Constructing Suffix Trees On-Line in Linear Time , 1992, IFIP Congress.

[270]  E. Tsafantakis,et al.  Diagnosing a popliteal venous aneurysm in a primary care setting: A case report , 2008, Journal of medical case reports.

[271]  G. Fraser,et al.  Reliability of serum and urinary isoflavone estimates , 2010, Biomarkers : biochemical indicators of exposure, response, and susceptibility to chemicals.

[272]  B. LeBaron Agent-based Computational Finance , 2006 .

[273]  Lucia Specia,et al.  Using Natural Language Processing for Automatic Detection of Plagiarism , 2010 .

[274]  Rick Weible,et al.  Changes in academic dishonesty among MIS majors between 1999 and 2004 , 2006, J. Comput. High. Educ..