URL decay in MEDLINE - a 4-year follow-up study

MOTIVATION Internet-based electronic resources, as given by Uniform Resource Locators (URLs), are being increasingly used in scientific publications but are also becoming inaccessible in a time-dependant manner, a phenomenon documented across disciplines. Initial reports brought attention to the problem, spawning methods of effectively preserving URL content while some journals adopted policies regarding URL publication and begun storing supplementary information on journal websites. Thus, a reexamination of URL growth and decay in the literature is merited to see if the problem has grown or been mitigated by any of these changes. RESULTS After the 2003 study, three follow-up studies were conducted in 2004, 2005 and 2007. Unfortunately, no significant change was found in the rate of URL decay among any of the studies. However, only 5% of URLs cited more than twice have decayed versus 20% of URLs cited once or twice. The most common types of lost content were computer programs (43%), followed by scholarly content (38%) and databases (19%). Compared to URLs still available, no lost content type was significantly over- or underrepresented. Searching for 30 of these websites using Google, 11 (37%) were found relocated to different URLs. CONCLUSIONS URL decay continues unabated, but URLs published by organizations tend to be more stable. Repeated citation of URLs suggests calculation of an electronic impact factor (eIF) would be an objective, quantitative way to measure the impact of Internet-based resources on scientific research.

[1]  Lisa M Schilling,et al.  Addressing internet reference loss , 2004, The Lancet.

[2]  A. Thorp,et al.  Accessibility of Internet References in Annals of Emergency Medicine: Is It Time to Require Archiving? , 2007, Annals of Emergency Medicine.

[3]  Mounir Errami,et al.  Déjà vu - A study of duplicate citations in Medline , 2008, Bioinform..

[4]  Lisa M Schilling,et al.  Internet citations in oncology journals: a vanishing resource? , 2004, Journal of the National Cancer Institute.

[5]  François Pachet,et al.  Content management for electronic music distribution , 2003, CACM.

[6]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[7]  Michael A. Veronin,et al.  Where Are They Now? A Case Study of Health-related Web Site Attrition , 2002, Journal of medical Internet research.

[8]  Lincoln F. Pratson,et al.  Panoramas of the Seafloor , 1997 .

[9]  Jonathan D Wren,et al.  E‐mail decay rates among corresponding authors in MEDLINE , 2006, EMBO reports.

[10]  R. Dellavalle,et al.  Going, Going, Gone: Lost Internet References , 2003, Science.

[11]  Diomidis Spinellis The decay and failures of web references , 2003, CACM.

[12]  Victoria Reich,et al.  Preserving today's scientific record for tomorrow , 2004, BMJ : British Medical Journal.

[13]  Jonathan D Wren,et al.  Bioinformatics leads charge by publishing more Internet addresses in abstracts than any other journal. , 2004, Bioinformatics.

[14]  Tefko Saracevic,et al.  Information science: What is it? , 1968 .

[15]  Mary F. Casserly,et al.  Web Citation Availability: Analysis and Implications for Scholarship , 2003 .

[16]  Wallace Koehler,et al.  An Analysis of Web Page and Web Site Constancy and Permanence , 1999, J. Am. Soc. Inf. Sci..

[17]  Lisa M Schilling,et al.  Digital information archiving policies in high-impact medical and scientific periodicals. , 2004, JAMA.

[18]  Lisa M Schilling,et al.  Information science. Going, going, gone: lost Internet references. , 2003, Science.

[19]  George M. Spyrou,et al.  A Survey of the Availability of Primary Bioinformatics Web Resources , 2007, Genom. Proteom. Bioinform..

[20]  David M. Pennock,et al.  Persistence of Web References in Scientific Research , 2001, Computer.

[21]  Jonathan D. Wren,et al.  404 not found: the stability and persistence of URLs published in MEDLINE , 2004, Bioinform..

[22]  Mary Rumsey Runaway Train: Problems of Permanence, Accessibility, and Stability in the Use of Web Sources in Law Review Citations , 2002 .

[23]  Gunther Eysenbach,et al.  Going, Going, Still There: Using the WebCite Service to Permanently Archive Cited Web Pages , 2005, AMIA.

[24]  Jonathan D Wren,et al.  Uniform resource locator decay in dermatology journals: author attitudes and preservation practices. , 2006, Archives of dermatology.

[25]  Sina Madani,et al.  Prevalence and Inaccessibility of URLs in the Biomedical Literature , 2006, AMIA.

[26]  Kay Smith Independence day? , 2007, Nature.

[27]  Dominik Aronsky,et al.  The Life and Death of URLs in Five Biomedical Informatics Journals , 2007, AMIA.

[28]  Wallace Koehler,et al.  Web page change and persistence - A four-year longitudinal study , 2002, J. Assoc. Inf. Sci. Technol..

[29]  Stuart Weibel,et al.  The PURL Project , 1995 .