Modelling citation networks

The distribution of the number of academic publications against citation count for papers published in the same year is remarkably similar from year to year. We characterise the shape of such distributions by a ‘width’, $$\sigma ^2$$σ2, associated with fitting a log-normal to each distribution, and find the width to be approximately constant for publications published in different years. This similarity is not surprising, after all, why would papers in a given year be cited more than another year? Nevertheless, we show that simple citation models fail to capture this behaviour. We then provide a simple three parameter citation network model which can reproduce the correct width over time. We use the citation network of papers from the hep-th section of arXiv to test our model. Our final model reproduces the data’s observed ‘width’ when around 20 % of the citations in the model are made to recently published papers in the entire network (‘global information’). The remaining 80 % of citations are made using the references from these papers’ bibliographies (‘local searches’). We note that this is consistent with other studies, though our motivation to achieve the above distribution with time is very different. Finally, we find that, in the citation network model, varying the number of papers referenced by a new publication is important as it alters the parameters in the model which are fitted to the data. This is not addressed in current models and needs further work.

[1]  B. Frey,et al.  Do Rankings Reflect Research Quality? , 2008, SSRN Electronic Journal.

[2]  Quentin L. Burrell,et al.  Are “Sleeping Beauties” to be expected? , 2005, Scientometrics.

[3]  Xianmin Geng,et al.  Degree correlations in citation networks model with aging , 2009 .

[4]  S. Redner,et al.  Finiteness and fluctuations in growing networks , 2002, cond-mat/0207107.

[5]  S. N. Dorogovtsev,et al.  Scaling properties of scale-free evolving networks: continuous approach. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Kamalika Basu Hajra,et al.  Aging in citation networks , 2004, cond-mat/0409017.

[7]  D. Sornette,et al.  Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales , 1998, cond-mat/9801293.

[8]  S. Redner,et al.  Organization of growing random networks. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Stephen J. Bensman,et al.  Classification and powerlaws: The logarithmic transformation , 2006, J. Assoc. Inf. Sci. Technol..

[11]  Pierre Pepin,et al.  The robustness of lognormal-based estimators of abundance , 1990 .

[12]  Michael Golosovsky,et al.  The Transition Towards Immortality: Non-linear Autocatalytic Growth of Citations to Scientific Papers , 2013, ArXiv.

[13]  James H. Fowler,et al.  Abstract Available online at www.sciencedirect.com Social Networks 30 (2008) 16–30 The authority of Supreme Court precedent , 2022 .

[14]  Fan Chung Graham,et al.  Duplication Models for Biological Networks , 2002, J. Comput. Biol..

[15]  A. Vázquez Statistics of citation networks , 2001, cond-mat/0105031.

[16]  Anthony F. J. van Raan,et al.  Two-step competition process leads to quasi power-law income distributions , 2001 .

[17]  Tim S. Evans,et al.  Transitive Reduction of Citation Networks , 2013, J. Complex Networks.

[18]  Vwani P. Roychowdhury,et al.  A mathematical theory of citing , 2005, J. Assoc. Inf. Sci. Technol..

[19]  Santo Fortunato,et al.  Characterizing and Modeling Citation Dynamics , 2011, PloS one.

[20]  Fay Dowker Causal sets as discrete spacetime , 2006 .

[21]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[22]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[23]  Claudio Castellano,et al.  A Reverse Engineering Approach to the Suppression of Citation Biases Reveals Universal Properties of Citation Distributions , 2012, PloS one.

[24]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[25]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[26]  Yan Wu,et al.  Generalized preferential attachment considering aging , 2014, J. Informetrics.

[27]  S. N. Dorogovtsev,et al.  Structure of growing networks with preferential linking. , 2000, Physical review letters.

[28]  K. Kaski,et al.  Scale-free networks generated by random walkers , 2004, cond-mat/0404088.

[29]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[30]  T. S. Evans,et al.  Universality of performance indicators based on citation and reference counts , 2011, Scientometrics.

[31]  P. Holland,et al.  Transitivity in Structural Models of Small Groups , 1971 .

[32]  Anthony F. J. van Raan,et al.  Universality of citation distributions revisited , 2011, J. Assoc. Inf. Sci. Technol..

[33]  Peter Vinkler,et al.  Evaluation of some methods for the relative assessment of scientific publications , 1986, Scientometrics.

[34]  Filippo Radicchi,et al.  On the fairness of using relative indicators for comparing citation performance in different disciplines , 2009, Archivum Immunologiae et Therapiae Experimentalis.

[35]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[36]  Vwani P. Roychowdhury,et al.  Stochastic modeling of citation slips , 2004, Scientometrics.

[37]  K. Hajra,et al.  Modelling aging characteristics in citation networks , 2005, physics/0508035.

[38]  Albert-László Barabási,et al.  Linked: The New Science of Networks , 2002 .

[39]  M. Sales-Pardo,et al.  Effectiveness of Journal Ranking Schemes as a Tool for Locating Information , 2008, PloS one.

[40]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[41]  Vincent Larivière,et al.  Modeling a century of citation distributions , 2008, J. Informetrics.

[42]  A. Vázquez Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Hamid Bouabid,et al.  Revisiting citation aging: a model for citation distribution and life-cycle prediction , 2011, Scientometrics.

[44]  Matjaz Perc,et al.  The Matthew effect in empirical data , 2014, Journal of The Royal Society Interface.

[45]  Christian Sternitzke,et al.  Visualizing patent statistics by means of social network analysis tools , 2008 .

[46]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[47]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[48]  Han Zhu,et al.  Effect of aging on network structure. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[50]  Per O. Seglen,et al.  The Skewness of Science , 1992, J. Am. Soc. Inf. Sci..

[51]  Pawel Sobkowicz Peer-review in the Internet age , 2008, ArXiv.

[52]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[53]  Ian Stewart Professor Stewart's Cabinet of Mathematical Curiosities , 2009 .

[54]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[55]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[56]  Stephen J Shennan,et al.  Random drift and culture change , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[57]  S. N. Dorogovtsev,et al.  Evolution of networks with aging of sites , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[58]  Ernesto Estrada,et al.  The Structure of Complex Networks: Theory and Applications , 2011 .

[59]  Xueqi Cheng,et al.  Modeling the clustering in citation networks , 2011, ArXiv.

[60]  Michal Brzezinski,et al.  Power laws in citation distributions: evidence from Scopus , 2014, Scientometrics.

[61]  Sergei Maslov,et al.  Promise and Pitfalls of Extending Google's PageRank Algorithm to Citation Networks , 2008, The Journal of Neuroscience.

[62]  Lawrence J. Smolinsky,et al.  Testing theories of preferential attachment in random networks of citations , 2015, J. Assoc. Inf. Sci. Technol..

[63]  Tim S. Evans,et al.  What is the dimension of citation space? , 2014, ArXiv.

[64]  T. Evans,et al.  Scale-free networks from self-organization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[65]  Thijs Pollman,et al.  Forgetting and the Ageing of Scientific Publications , 2004, Scientometrics.

[66]  Marta Sales-Pardo,et al.  Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal , 2010, J. Assoc. Inf. Sci. Technol..

[67]  M. V. Simkin,et al.  Copied citations create renowned papers , 2003, cond-mat/0305150.

[68]  V. Roychowdhury,et al.  Re-inventing Willis , 2006, physics/0601192.

[69]  Luciano da Fontoura Costa,et al.  Journal of Complex Networks , 2013 .

[70]  A. Vázquez Knowing a network by walking on it: emergence of scaling , 2000, cond-mat/0006132.

[71]  S Redner,et al.  Degree distributions of growing networks. , 2001, Physical review letters.

[72]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[73]  Thed N. van Leeuwen,et al.  New bibliometric tools for the assessment of national research performance: Database description, overview of indicators and first applications , 1995, Scientometrics.

[74]  ANTHONY F. J. VAN RAAN,et al.  Sleeping Beauties in science , 2004, Scientometrics.