The distribution of Web citations

A substantial amount of research has focused on the persistence or availability of Web citations. The present study analyzes Web citation distributions. Web citations are defined as the mentions of the URLs of Web pages (Web resources) as references in academic papers. The present paper primarily focuses on the analysis of the URLs of Web citations and uses three sets of data, namely, Set 1 from the Humanities and Social Science Index in China (CSSCI, 1998-2009), Set 2 from the publications of two international computer science societies, Communications of the ACM and IEEE Computer (1995-1999), and Set 3 from the medical science database, MEDLINE, of the National Library of Medicine (1994-2006). Web citation distributions are investigated based on Web site types, Web page types, URL frequencies, URL depths, URL lengths, and year of article publication. Results show significant differences in the Web citation distributions among the three data sets. However, when the URLs of Web citations with the same hostnames are aggregated, the distributions in the three data sets are consistent with the power law (the Lotka function).

[1]  Johan Bollen,et al.  The Availability and Persistence of Web References in D-Lib Magazine , 2005, ArXiv.

[2]  Miranda Lee Pao An empirical examination of Lotka's law , 1986, J. Am. Soc. Inf. Sci..

[3]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[4]  Lisa M Schilling,et al.  Internet citations in oncology journals: a vanishing resource? , 2004, Journal of the National Cancer Institute.

[5]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[6]  Mary F. Casserly,et al.  Web citation availability: A follow-up study , 2008 .

[7]  Dangzhi Zhao,et al.  Challenges of scholarly publications on the Web to the evaluation of science - A comparison of author visibility on the Web and in print journals , 2005, Inf. Process. Manag..

[8]  M. Oermann,et al.  Web citations in the nursing literature: how accurate are they? , 2008, Journal of professional nursing : official journal of the American Association of Colleges of Nursing.

[9]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Paul Travis Nicholls,et al.  Empirical validation of Lotka's law , 1986, Inf. Process. Manag..

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  L. Egghe Power Laws in the Information Production Process: Lotkaian Informetrics , 2005 .

[13]  Sina Madani,et al.  The Prevalence and Inaccessibility of Internet References in the Biomedical Literature at the Time of Publication , 2007, J. Am. Medical Informatics Assoc..

[14]  Mike Thelwall,et al.  Motivations for URL citations to open access library and information science articles , 2006, Scientometrics.

[15]  Mary Rumsey Runaway Train: Problems of Permanence, Accessibility, and Stability in the Use of Web Sources in Law Review Citations , 2002 .

[16]  R. Dellavalle,et al.  Going, Going, Gone: Lost Internet References , 2003, Science.

[17]  Michel L. Goldstein,et al.  Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.

[18]  Ronald Rousseau,et al.  Web-to-print citations and the humanities , 2006 .

[19]  Elisabeth Logan,et al.  Citation analysis using scientific publications on the Web as data source: A case study in the XML research area , 2002, Scientometrics.

[20]  Peter Ingwersen,et al.  Toward a basic framework for webometrics , 2004, J. Assoc. Inf. Sci. Technol..

[21]  Cecelia Brown The role of Web-based information in the scholarly communication of chemists: Citation and content analyses of American Chemical Society Journals , 2007 .

[22]  Liwen Vaughan,et al.  Web citation data for impact assessment: A comparison of four science disciplines: Book Reviews , 2005 .

[23]  Stephen P. Harter,et al.  Guest Paper: Electronic journals and scholarly communication: a citation and reference study , 1996, Information Research.

[24]  Debora Shaw,et al.  Bibliographic and Web citations: What is the difference? , 2003, J. Assoc. Inf. Sci. Technol..

[25]  Debora Shaw,et al.  Web citation data for impact assessment: A comparison of four science disciplines , 2005, J. Assoc. Inf. Sci. Technol..

[26]  Mike Thelwall,et al.  Three target document range metrics for university web sites , 2003, J. Assoc. Inf. Sci. Technol..

[27]  Marija Petek Personal name headings in COBIB: Testing Lotka’s Law , 2007, Scientometrics.

[28]  Zhiqiang Wu,et al.  An empirical study of the accessibility of web references in two Chinese academic journals , 2009, Scientometrics.

[29]  Yin Zhang,et al.  The impact of Internet-based electronic resources on formal scholarly communication in the area of library and information science: a citation analysis , 1998, J. Inf. Sci..

[30]  M. McKnight Foundations of Library and Information Science. 2nd ed , 2005 .

[31]  Jonathan D. Wren,et al.  404 not found: the stability and persistence of URLs published in MEDLINE , 2004, Bioinform..

[32]  Cristina Faba-Pérez,et al.  Sitation distributions and Bradford's law in a closed Web space , 2003 .

[33]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[34]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[35]  M. A. Mubeen CITATION ANALYSIS OF DOCTORAL DISSERTATIONS IN CHEMISTRY , 1996 .

[36]  Tefko Saracevic,et al.  Information science: What is it? , 1968 .

[37]  Mary F. Casserly,et al.  Web Citation Availability: Analysis and Implications for Scholarship , 2003 .

[38]  Richard E. Rubin Foundations of Library and Information Science. , 1998 .

[39]  Lada A. Adamic,et al.  Internet: Growth dynamics of the World-Wide Web , 1999, Nature.

[40]  Junping Qiu,et al.  An empirical study on the utilization of web academic resources in humanities and social sciences based on web citations , 2010, Scientometrics.

[41]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[42]  Philip M. Davis,et al.  Impact of Farmer Field Schools on Agricultural Productivity and Poverty in East Africa , 2012 .

[43]  R. Rousseau,et al.  LOTKA: A program to fit a power law distribution to observed frequency data. , 2000 .

[44]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[45]  Hildrun Kretschmer,et al.  Author inflation leads to a breakdown of Lotka's law , 2001 .

[46]  Christina Courtright,et al.  Context in information behavior research , 2007 .

[47]  David M. Pennock,et al.  Persistence of Web References in Scientific Research , 2001, Computer.

[48]  Mike Thelwall,et al.  Link Analysis: An Information Science Approach , 2004 .

[49]  Philip M. Davis,et al.  The effect of the Web on undergraduate citation behavior: A 2000 update , 2002 .

[50]  B. T. Sampath Kumar,et al.  Web Citation Behaviour in Scholarly Electronic Journals in the Field of Library and Information Science , 2008, Webology.

[51]  Diomidis Spinellis,et al.  The decay and failures of web references , 2003, CACM.

[52]  Fang Liu,et al.  An update on Uniform Resource Locator (URL) decay in MEDLINE abstracts and measures for its mitigation , 2008, BMC Medical Informatics Decis. Mak..

[53]  R. Rousseau Sitations: an exploratory study , 1997 .

[54]  M. Thelwall,et al.  Google Scholar citations and Google Web-URL citations: A multi-discipline exploratory analysis , 2007 .

[55]  Eugene Garfield,et al.  Citation indexing - its theory and application in science, technology, and humanities , 1979 .

[56]  Mike Thelwall,et al.  How is science cited on the Web? A classification of google unique Web citations , 2007 .