Déjà vu - A study of duplicate citations in Medline

MOTIVATION Duplicate publication impacts the quality of the scientific corpus, has been difficult to detect, and studies this far have been limited in scope and size. Using text similarity searches, we were able to identify signatures of duplicate citations among a body of abstracts. RESULTS A sample of 62,213 Medline citations was examined and a database of manually verified duplicate citations was created to study author publication behavior. We found that 0.04% of the citations with no shared authors were highly similar and are thus potential cases of plagiarism. 1.35% with shared authors were sufficiently similar to be considered a duplicate. Extrapolating, this would correspond to 3500 and 117,500 duplicate citations in total, respectively. AVAILABILITY eTBLAST, an automated citation matching tool, and Déjà vu, the duplicate citation database, are freely available at http://invention.swmed.edu/ and http://spore.swmed.edu/dejavu

[1]  M. Schein,et al.  Redundant surgical publications: tip of the iceberg? , 2001, Surgery.

[2]  Xiaoyi Jiang,et al.  Redundant publications in scientific ophthalmologic journals: the tip of the iceberg? , 2004, Ophthalmology.

[3]  Erik von Elm,et al.  Different patterns of duplicate publication: an analysis of articles used in systematic reviews. , 2004, JAMA.

[4]  B. J. Bailey,et al.  Duplicate Publication in the Field of Otolaryngology-Head and Neck Surgery , 2002, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[5]  Thomas F. Budinger,et al.  Ethics of emerging technologies , 2006 .

[6]  Miguel Roig,et al.  Re-Using Text from One's Own Previously Published Papers: An Exploratory Study of Potential Self-Plagiarism , 2005, Psychological reports.

[7]  E. Rosenthal,et al.  Duplicate Publications in the Otolaryngology Literature , 2003, The Laryngoscope.

[8]  Frederick Grinnell,et al.  Misconduct: acceptable practices differ by field , 2005, Nature.

[9]  A. Overbeke,et al.  [Duplicate publication of original manuscripts in and from the Nederlands Tijdschrift voor Geneeskunde]. , 1993, Nederlands tijdschrift voor geneeskunde.

[10]  Dustin Johnson,et al.  Duplicate publication and ‘paper inflation’ in the fractals literature , 2006, Science and engineering ethics.

[11]  H. Giele,et al.  Duplicate Publication in the Journal of Hand Surgery , 2004, Journal of hand surgery.

[12]  Jim Giles,et al.  Special Report: Taking on the cheats , 2005, Nature.

[13]  Emma Marris,et al.  Should journals police scientific fraud? , 2006, Nature.

[14]  A. D. Jackson,et al.  Measures for measures , 2006, Nature.

[15]  James Lewis,et al.  Data and text mining Text similarity : an alternative way to search MEDLINE , 2006 .

[16]  Thomas F. Budinger,et al.  Ethics of Emerging Technologies: Scientific Facts and Moral Challenges , 2006 .

[17]  R. K. Young,et al.  Duplicate publication in the nursing literature. , 1995, Image--the journal of nursing scholarship.

[18]  P. Durani,et al.  Duplicate publications: redundancy in plastic surgery literature. , 2006, Journal of plastic, reconstructive & aesthetic surgery : JPRAS.

[19]  Helen Pearson,et al.  Forensic software traces tweaks to images , 2006, Nature.

[20]  Melissa S. Anderson,et al.  Scientists behaving badly , 2005, Nature.

[21]  D. Barnes,et al.  Consensus and contention regarding redundant publications in clinical research: cross-sectional survey of editors and authors , 2003, Journal of medical ethics.

[22]  A. Overbeke,et al.  [Duplicate publication of articles in the Dutch Journal of Medicine in 1996]. , 1999, Nederlands tijdschrift voor geneeskunde.

[23]  P. Gøtzsche,et al.  Multiple publication of reports of drug trials , 2004, European Journal of Clinical Pharmacology.