Moneyball for Academics: Network Analysis for Predicting Research Impact

How are scholars ranked for promotion, tenure and honors? How can we improve the quantitative tools available for decision makers when making such decisions? Can we predict the academic impact of scholars and papers at early stages using quantitative tools?Current academic decisions (hiring, tenure, prizes) are mostly very subjective. In the era of “Big Data,” a solid quantitative set of measurements should be used to support this decision process.This paper presents a method for predicting the probability of a paper being in the most cited papers using only data available at the time of publication. We find that highly cited papers have different structural properties and that these centrality measures are associated with increased odds of being in the top percentile of citation count.The paper also presents a method for predicting the future impact of researchers, using information available early in their careers. This model integrates information about changes in a young researcher’s role in the citation network and co-authorship network and demonstrates how this improves predictions of their future impact.These results show that the use of quantitative methods can complement the qualitative decision-making process in academia and improve the prediction of academic impact.

[1]  Thomas W. Valente Network models of the diffusion of innovations , 1996, Comput. Math. Organ. Theory.

[2]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[3]  Gunther Eysenbach,et al.  Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact , 2011, Journal of medical Internet research.

[4]  K. A. McKibbon,et al.  Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study , 2008, BMJ : British Medical Journal.

[5]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[6]  Lutz Bornmann,et al.  Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine , 2008, J. Assoc. Inf. Sci. Technol..

[7]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[8]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[9]  Jeanne G. Harris,et al.  Competing on Analytics: The New Science of Winning , 2007 .

[10]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Daniel G. Bachrach,et al.  Scholarly Influence in the Field of Management: A Bibliometric Analysis of the Determinants of University and Author Impact in the Management Literature in the Past Quarter Century , 2008 .

[13]  Susan T. Dumais,et al.  Predicting Citation Counts Using Text and Graph Mining , 2013 .

[14]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Brendan T. O'Connor,et al.  Predicting a Scientific Community’s Response to an Article , 2011, EMNLP.

[16]  Stevan Harnad,et al.  Earlier Web Usage Statistics as Predictors of Later Citation Impact , 2005, J. Assoc. Inf. Sci. Technol..

[17]  T. Valente,et al.  Network models of the diffusion of innovations , 1995, Comput. Math. Organ. Theory.

[18]  Janet Kleber,et al.  Sometimes the impact factor outshines the H index , 2008, Retrovirology.

[19]  Anne-Wil Harzing,et al.  REFLECTIONS ON THE H-INDEX , 2012 .

[20]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[21]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[22]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[23]  Lorin M. Hitt,et al.  Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance? , 2011, ICIS 2011.

[24]  M. Sarvary,et al.  Network Effects and Personal Influences: The Diffusion of an Online Social Network , 2011 .

[25]  Lawrence D. Fu,et al.  Models for Predicting and Explaining Citation Count of Biomedical Articles , 2008, AMIA.

[26]  Jonathan L. Johnson Brokerage and Closure: An Introduction to Social Capital , 2006 .

[27]  Ronald S. Burt,et al.  Structural Holes: The Social Structure of Competition. , 1994 .

[28]  Jonathan Adams,et al.  Early citation counts correlate with accumulated impact , 2005, Scientometrics.

[29]  J. Hirsch Does the h index have predictive power? , 2007, Proceedings of the National Academy of Sciences.

[30]  Naoki Shibata,et al.  Topological analysis of citation networks to discover the future core articles , 2007, J. Assoc. Inf. Sci. Technol..

[31]  E. Garfield,et al.  Of Nobel class: A citation perspective on high impact research authors , 1992, Theoretical medicine.

[32]  Paul Benjamin Lowry,et al.  Profiling the Research Productivity of Tenured Information Systems Faculty at U.S. Institutions , 2011, MIS Q..