Digital commons

The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.

[1]  F. C. Lincoln Calculating waterfowl abundance on the basis of banding returns , 1930 .

[2]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[3]  H. Akaike A new look at the statistical model identification , 1974 .

[4]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[5]  R. Cormack Log-linear models for capture-recapture , 1989 .

[6]  P. Jupp,et al.  Inference for Poisson and multinomial models for capture-recapture experiments , 1991 .

[7]  R. Cormack Interval estimation for mark-recapture studies of closed populations. , 1992, Biometrics.

[8]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[9]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[10]  Giles,et al.  Searching the world wide Web , 1998, Science.

[11]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Stephen E. Fienberg,et al.  How Large Is the World Wide Web , 2004 .

[14]  Stevan Harnad,et al.  Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact , 2005, IEEE Data Eng. Bull..

[15]  L. Rivest,et al.  Rcapture: Loglinear Models for Capture-Recapture in R , 2007 .

[16]  Fytton Rowland,et al.  The citation advantage of open-access articles , 2008 .

[17]  Judit Bar-Ilan,et al.  Which h-index? — A comparison of WoS, Scopus and Google Scholar , 2008, Scientometrics.

[18]  Bo-Christer Björk,et al.  Scientific journal publishing: yearly volume and open access availability , 2009, Inf. Res..

[19]  B. Björk,et al.  Open Access to the Scientific Journal Literature: Situation 2009 , 2010, PloS one.

[20]  Vincent Larivière,et al.  Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research , 2010, PloS one.

[21]  Judit Bar-Ilan,et al.  Citations to the “Introduction to informetrics” indexed by WOS, Scopus and Google Scholar , 2010, Scientometrics.

[22]  Richard Van Noorden Open access: The true cost of science publishing , 2013, Nature.