The productivity of top researchers: a semi-nonparametric approach

Research productivity distributions exhibit heavy tails because it is common for a few researchers to accumulate the majority of the top publications and their corresponding citations. Measurements of this productivity are very sensitive to the field being analyzed and the distribution used. In particular, distributions such as the lognormal distribution seem to systematically underestimate the productivity of the top researchers. In this article, we propose the use of a (log)semi-nonparametric distribution (log-SNP) that nests the lognormal and captures the heavy tail of the productivity distribution through the introduction of new parameters linked to high-order moments. The application uses scientific production data on 140,971 researchers who have produced 253,634 publications in 18 fields of knowledge (O’Boyle and Aguinis in Pers Psychol 65(1):79–119, 2012) and publications in the field of finance of 330 academic institutions (Borokhovich et al. in J Finance 50(5):1691–1717, 1995), and shows that the log-SNP distribution outperforms the lognormal and provides more accurate measures for the high quantiles of the productivity distribution.

[1]  Glenn Ellison,et al.  How Does the Market Use Citation Data? The Hirsch Index in Economics , 2010, SSRN Electronic Journal.

[2]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[3]  Paul Travis Nicholls Bibliometric modeling processes and the empirical validity of Lotka's Law , 1989, JASIS.

[4]  Ronald Rousseau,et al.  Bradford Curves , 1994, Inf. Process. Manag..

[5]  Anne-Wil Harzing,et al.  Google Scholar as a new source for citation analysis , 2008 .

[6]  C. Ai Semi-nonparametric Maximum Likelihood Estimation of Conditional Moment Restriction Models , 2007 .

[7]  Peter van den Besselaar,et al.  What is the Required Level of Data Cleaning? A Research Evaluation Case , 2016, J. Sci. Res..

[8]  Theodore Eugene Day,et al.  The big consequences of small biases: A simulation of peer review , 2015 .

[9]  Rocío Gómez Crisóstomo,et al.  Import-export of knowledge between scientific subject categories: The iceberg hypothesis , 2007, Scientometrics.

[10]  P. Phillips A General Theorem in the Theory of Asymptotic Expansions as Approximations to the Finite Sample Distributions of Econometric Estimators , 1977 .

[11]  Raymond A. K. Cox,et al.  Patterns of Productivity in the Finance Literature: A Study of the Bibliometric Distributions , 1990 .

[12]  Loet Leydesdorff,et al.  A review of theory and practice in scientometrics , 2015, Eur. J. Oper. Res..

[13]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[14]  Filippo Radicchi,et al.  The Possible Role of Resource Requirements and Academic Career-Choice Risk on Gender Differences in Publication Rate and Impact , 2012, PloS one.

[15]  Giovanni Abramo,et al.  Assessing national strengths and weaknesses in research fields , 2014, J. Informetrics.

[16]  Rodrigo Costas,et al.  The skewness of scientific productivity , 2014, J. Informetrics.

[17]  David A. Griffith,et al.  What does it Take to Get Promoted in Marketing Academia? Understanding Exceptional Publication Productivity in the Leading Marketing Journals , 2009 .

[18]  Filippo Menczer,et al.  Universality of scholarly impact metrics , 2013, J. Informetrics.

[19]  Paul Travis Nicholls,et al.  Empirical validation of Lotka's law , 1986, Inf. Process. Manag..

[20]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[21]  L. Egghe Power Laws in the Information Production Process: Lotkaian Informetrics , 2005 .

[22]  Halil Dundar,et al.  DETERMINANTS OF RESEARCH PRODUCTIVITY IN HIGHER EDUCATION , 1998 .

[23]  Lokman I. Meho,et al.  Citation Analysis: A Comparison of Google Scholar, Scopus, and Web of Science , 2007, Proceedings of the American Society for Information Science and Technology.

[24]  Birger Hjørland,et al.  Practical potentials of Bradford's law: a critical examination of the received view , 2007, J. Documentation.

[25]  Geoffrey M. Hodgson,et al.  The Editors and Authors of Economics Journals: A Case of Institutional Oligopoly? , 1999 .

[26]  Tove Faber Frandsen,et al.  Geographical concentration , 2005, Scientometrics.

[27]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[28]  T. Coupé REVEALED PERFORMANCES: WORLDWIDE RANKINGS OF ECONOMISTS AND ECONOMICS DEPARTMENTS, 1990–2000 , 2003 .

[29]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[30]  A. Gallant,et al.  Semi-nonparametric Maximum Likelihood Estimation , 1987 .

[31]  Klaus Wohlrabe,et al.  The Matthew Effect in Economics Reconsidered , 2014, J. Informetrics.

[32]  M. Kocher,et al.  Measuring productivity of research in economics: A cross-country study using DEA , 2006 .

[33]  Filippo Menczer,et al.  Quality versus quantity in scientific impact , 2014, J. Informetrics.

[34]  Matjaz Perc,et al.  Zipf's law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia's research as an example , 2010, J. Informetrics.

[35]  Philipp Mayr,et al.  Evaluating Co-authorship Networks in Author Name Disambiguation for Common Names , 2016, TPDL.

[36]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[37]  Meghna Sabharwal,et al.  Comparing Research Productivity Across Disciplines and Career Stages , 2013 .

[38]  Herman Aguinis,et al.  THE BEST AND THE REST: REVISITING THE NORM OF NORMALITY OF INDIVIDUAL PERFORMANCE , 2012 .

[39]  Xiaohong Chen Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .

[40]  Anders Martin-Löf On some classes of series used in mathematical statistics , 1994 .

[41]  Javier Perote,et al.  Gram–Charlier densities: Maximum likelihood versus the method of moments , 2012 .

[42]  Peter Nijkamp,et al.  Accessibility of Cities in the Digital Economy , 2004, cond-mat/0412004.

[43]  T. Jappelli,et al.  Bibliometric Evaluation vs. Informed Peer Review: Evidence from Italy , 2013, SSRN Electronic Journal.

[44]  Lutz Bornmann,et al.  Scientific peer review , 2011, Annu. Rev. Inf. Sci. Technol..

[45]  On the stability of the constant relative risk aversion (CRRA) utility under high degrees of uncertainty , 2012 .

[46]  Santo Fortunato,et al.  Characterizing and Modeling Citation Dynamics , 2011, PloS one.

[47]  José Palazzo Moreira de Oliveira,et al.  Universality in Bibliometrics , 2011, ArXiv.

[48]  Christina A. Christie,et al.  A Bibliometric Analysis of the Academic Influences of and on Evaluation Theorists’ Published Works , 2010 .

[49]  I. Mauleón,et al.  Testing densities with financial data: an empirical comparison of the EdgeworthSargan density to the Students t , 2000 .

[50]  Ian O. Williamson,et al.  Predicting early career research productivity: The case of management faculty , 2003 .

[51]  Suresh Kumar,et al.  Lotka's Law and Institutional Productivity , 1998, Inf. Process. Manag..

[52]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[53]  G Kendall Maurice,et al.  The Advanced Theory Of Statistics Vol-i , 1943 .

[54]  Anne-Wil Harzing,et al.  A longitudinal study of Google Scholar coverage between 2012 and 2013 , 2013, Scientometrics.

[55]  B. Simkins,et al.  Finance Research Productivity and Influence , 1995 .

[56]  M. G. Kendall,et al.  The advanced theory of statistics. Vols. 2. , 1969 .

[57]  Anne-Wil Harzing,et al.  Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison , 2015, Scientometrics.

[58]  Javier Ruiz-Castillo,et al.  The Citation Merit of Scientific Publications , 2012, PloS one.

[59]  Herman Aguinis,et al.  CUMULATIVE ADVANTAGE: CONDUCTORS AND INSULATORS OF HEAVY-TAILED PRODUCTIVITY DISTRIBUTIONS AND PRODUCTIVITY STARS , 2016 .

[60]  Hildrun Kretschmer,et al.  Lotka's distribution and distribution of co-author pairs' frequencies , 2007, J. Informetrics.

[61]  D. Peel,et al.  Higher-order moments in the theory of diversification and portfolio composition , 2013 .

[62]  Giovanni Abramo,et al.  The measurement of Italian universities’ research productivity by a non parametric-bibliometric methodology , 2008, Scientometrics.

[63]  Bradford’s Law and Related Statistical Patterns k,-..-L. <()-----,. _’l~urnr.wr IY , 2022 .

[64]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[65]  Bárbara S. Lancho-Barrantes,et al.  The iceberg hypothesis revisited , 2010, Scientometrics.

[66]  J. Sargan GRAM-CHARLIER APPROXIMATIONS APPLIED TO t RATIOS OF k-CLASS ESTIMATORS , 1975 .

[67]  Pedro Albarrán,et al.  The skewness of science in 219 sub-fields and a number of aggregates , 2010, Scientometrics.

[68]  S.Blinnikov,et al.  Expansions for nearly Gaussian distributions , 1997 .

[69]  D. L. Wallace Asymptotic Approximations to Distributions , 1958 .

[70]  Ugo Finardi,et al.  Impact Factor and Citation Performance : n experimental study , 2013 .

[71]  Christian Genest,et al.  Statistics on statistics: Measuring research productivity by journal publications between 1985 and 1995 , 1997 .

[72]  Robert N. Broadus Toward a definition of “bibliometrics” , 1987, Scientometrics.

[73]  Juan Miguel Campanario Providing impact: The distribution of JCR journals according to references they contribute to the 2-year and 5-year journal impact factors , 2015, J. Informetrics.

[74]  G. Cocho,et al.  Universality of Rank-Ordering Distributions in the Arts and Sciences , 2009, PloS one.