The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression

Identifying the statistical distribution that best fits citation data is important to allow robust and powerful quantitative analyses. Whilst previous studies have suggested that both the hooked power law and discretised lognormal distributions fit better than the power law and negative binomial distributions, no comparisons so far have covered all articles within a discipline, including those that are uncited. Based on an analysis of 26 different Scopus subject areas in seven different years, this article reports comparisons of the discretised lognormal and the hooked power law with citation data, adding 1 to citation counts in order to include zeros. The hooked power law fits better in two thirds of the subject/year combinations tested for journal articles that are at least three years old, including most medical, life and natural sciences, and for virtually all subject areas for younger articles. Conversely, the discretised lognormal tends to fit best for arts, humanities, social science and engineering fields. The difference between the fits of the distributions is mostly small, however, and so either could reasonably be used for modelling citation data. For regression analyses, however, the best option is to use ordinary least squares regression applied to the natural logarithm of citation counts plus one, especially for sets of younger articles, because of the increased precision of the parameters.

[1]  Mike Thelwall,et al.  Which factors help authors produce the highest impact research? Collaboration, journal and document properties , 2013, J. Informetrics.

[2]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[3]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[5]  S. Harnad,et al.  Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals , 2004 .

[6]  T. S. Evans,et al.  Universality of performance indicators based on citation and reference counts , 2011, Scientometrics.

[7]  A. Schubert,et al.  Scientometric Indicators: A 32-Century Comparative Evaluation of Publishing Performance and Citation Impact , 1985 .

[8]  Michal Brzezinski,et al.  Power laws in citation distributions: evidence from Scopus , 2014, Scientometrics.

[9]  Thed N. van Leeuwen,et al.  Towards a new crown indicator: Some theoretical considerations , 2010, J. Informetrics.

[10]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[11]  Isola Ajiferuke,et al.  Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models , 2015, J. Informetrics.

[12]  R. Merton The Matthew Effect in Science , 1968, Science.

[13]  Mike Thelwall,et al.  Distributions for cited articles from individual subjects and years , 2014, J. Informetrics.

[14]  Tian Yu,et al.  Citation impact prediction for scientific papers using stepwise regression analysis , 2014, Scientometrics.

[15]  Brij Mohan Gupta,et al.  Networks of scientific papers: A comparative analysis of co-citation, bibliographic coupling and direct citation , 1977 .

[16]  Wolfgang Glänzel,et al.  A bibliometric analysis of international scientific cooperation of the European Union (1985–1995) , 2006, Scientometrics.

[17]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[18]  Mike Thelwall,et al.  Stopped Sum Models for Citation Data , 2015, ISSI.

[19]  Thed N. van Leeuwen,et al.  Language biases in the coverage of the Science Citation Index and its consequencesfor international comparisons of national research performance , 2001, Scientometrics.

[20]  Christopher M. Snyder,et al.  Does Online Availability Increase Citations? Theory and Evidence from a Panel of Economics and Business Journals , 2013, Review of Economics and Statistics.

[21]  Ol. S. Garanina,et al.  Citation Distribution of Individual Scientist: Approximations of Stretch Exponential Distribution with Power Law Tails , 2016, ISSI.

[22]  Mike Thelwall,et al.  Mendeley readership altmetrics for medical articles: An analysis of 45 fields , 2016, J. Assoc. Inf. Sci. Technol..

[23]  Anthony F. J. van Raan,et al.  Two-step competition process leads to quasi power-law income distributions , 2001 .

[24]  Ali Gazni,et al.  Investigating different types of research collaboration and citation impact: a case study of Harvard University’s publications , 2011, Scientometrics.

[25]  Finn Jørgensen,et al.  The value of experience in research , 2015, J. Informetrics.

[26]  Robert B. O'Hara,et al.  Do not log‐transform count data , 2010 .

[27]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[28]  Jonathan Adams,et al.  Early citation counts correlate with accumulated impact , 2005, Scientometrics.

[29]  Anthony F. J. van Raan Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups , 2013, Scientometrics.

[30]  Mike Thelwall,et al.  More precise methods for national research citation impact comparisons , 2015, J. Informetrics.

[31]  Mike Thelwall,et al.  National research impact indicators from Mendeley readers , 2015, J. Informetrics.

[32]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[33]  Zainab Awang Ngah,et al.  Zainab Awang Ngah & Goi Sook Sze. The characteristics of citations used by humanities researchers, Malaysian Journal of Library and Information Science, Vol.2(2) Dec 1997: 19-36 , 1997 .

[34]  Charles Oppenheim,et al.  The Correlation between citation counts and the 1992 Research Assessment Exercise Ratings for British Library and Information Science University departments , 1995, J. Documentation.

[35]  Zainab Awang Ngah,et al.  Characteristics of Citations Used by Humanities Researchers , 1997 .

[36]  Mike Thelwall,et al.  The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach , 2015, J. Informetrics.

[37]  Mark Levene,et al.  A stochastic model for the evolution of the Web , 2002, Comput. Networks.

[38]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[39]  Björn Hellqvist,et al.  Referencing in the humanities and its implications for citation analysis , 2010, J. Assoc. Inf. Sci. Technol..

[40]  P. R. Chandy,et al.  The Impact of Journals and Authors on International Business Research: A Citational Analysis of JIBS Articles , 1994 .

[41]  Mike Thelwall,et al.  Regression for citation data: An evaluation of different methods , 2014, J. Informetrics.

[42]  Glory Kofi Hoggar,et al.  The Stochastic Model , 2018 .

[43]  Diana Hicks,et al.  The difficulty of achieving full coverage of international social science literature and the bibliometric consequences , 1999, Scientometrics.

[44]  P. András,et al.  Evaluating universities using simple scientometric research-output metrics: total citation counts per university for a retrospective seven-year rolling sample , 2007 .

[45]  Vincent Larivière,et al.  Benchmarking scientific output in the social sciences and humanities: The limits of existing databases , 2006, Scientometrics.

[46]  S. N. Dorogovtsev,et al.  Structure of growing networks with preferential linking. , 2000, Physical review letters.

[47]  Santo Fortunato,et al.  Characterizing and Modeling Citation Dynamics , 2011, PloS one.

[48]  Fuyuki Yoshikane,et al.  Factors affecting citation rates of research articles , 2015, J. Assoc. Inf. Sci. Technol..