Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal

Abstract A central issue in evaluative bibliometrics is the characterization of the citation distribution of papers in the scientific literature. Here, we perform a large-scale empirical analysis of journals from every field in Thomson Reuters' Web of Science database. We find that only 30 of the 2,184 journals have citation distributions that are inconsistent with a discrete lognormal distribution at the rejection threshold that controls the false discovery rate at 0.05. We find that large, multidisciplinary journals are over-represented in this set of 30 journals, leading us to conclude that, within a discipline, citation distributions are lognormal. Our results strongly suggest that the discrete lognormal distribution is a globally accurate model for the distribution of “eventual impact” of scientific papers published in single-discipline journal in a single year that is removed sufficiently from the present date.

[1]  Quentin L. Burrell,et al.  Stochastic modelling of the first-citation distribution , 2004, Scientometrics.

[2]  E. Garfield,et al.  The myth of delayed recognition , 2004 .

[3]  Katy Börner,et al.  Mapping knowledge domains , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  S. Redner,et al.  Organization of growing random networks. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Not-so-deep impact , 2005, Nature.

[6]  Leo Egghe,et al.  Aging, obsolescence, impact, growth, and utilization: Definitions and relations , 2000, J. Am. Soc. Inf. Sci..

[7]  S. Redner Citation statistics from 110 years of physical review , 2005, physics/0506056.

[8]  M. Sales-Pardo,et al.  Effectiveness of Journal Ranking Schemes as a Tool for Locating Information , 2008, PloS one.

[9]  Lutz Bornmann,et al.  Selecting manuscripts for a high-impact journal through peer review: A citation analysis of communications that were accepted by Angewandte Chemie International Edition, or rejected but published elsewhere , 2008, J. Assoc. Inf. Sci. Technol..

[10]  J. Taylor An Introduction to Error Analysis , 1982 .

[11]  Vwani P. Roychowdhury,et al.  A mathematical theory of citing , 2005, J. Assoc. Inf. Sci. Technol..

[12]  Quentin L. Burrell,et al.  Predicting future citation behavior , 2003, J. Assoc. Inf. Sci. Technol..

[13]  R. Perline Strong, Weak and False Inverse Power Laws , 2005 .

[14]  Kevin W. Boyack,et al.  Mapping the backbone of science , 2004, Scientometrics.

[15]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[16]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[17]  J. Lane Assessing the Impact of Science Funding , 2009, Science.

[18]  P. Wouters The citation culture , 1999 .

[19]  William Shockley,et al.  On the Statistics of Individual Variations of Productivity in Research Laboratories , 1957, Proceedings of the IRE.

[20]  J. Aitchison On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin , 1955 .

[21]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[22]  Samuel Kotz,et al.  Models for citation behavior , 2007, Scientometrics.

[23]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[24]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[25]  David Adam,et al.  Citation analysis: The counting house , 2002, Nature.

[26]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[27]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[28]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .