Can Scientific Impact Be Predicted?

A widely used measure of scientific impact is citations. However, due to their heavy-tailed distribution, citations are fundamentally difficult to predict. Instead, to characterize scientific impact, we address two analogous questions asked by many scientific researchers: “How will my h-index evolve over time, and which of my previously or newly published papers will contribute to it?” To answer these questions, we perform two related tasks. First, we develop a model to predict authors' future h-indices based on their current scientific impact. Second, we examine the factors that drive papers-either previously or newly published-to increase their authors' predicted future h-indices. By leveraging relevant factors, we can predict an author's h-index in five years with an R2 value of 0.92 and whether a previously (newly) published paper will contribute to this future h-index with an F1 score of 0.99 (0.77). We find that topical authority and publication venue are crucial to these effective predictions, while topic popularity is surprisingly inconsequential. Further, we develop an online tool that allows users to generate informed h-index predictions. Our work demonstrates the predictability of scientific impact, and can help researchers to effectively leverage their scholarly position of “standing on the shoulders of giants”.

[1]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[2]  L. Magee,et al.  R 2 Measures Based on Wald and Likelihood Ratio Joint Significance Tests , 1990 .

[3]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[4]  Nitesh V. Chawla,et al.  Collaboration signatures reveal scientific impact , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[5]  Nitesh V. Chawla,et al.  Will This Paper Increase Your h-index? , 2015, ECML/PKDD.

[6]  Lada A. Adamic,et al.  Information Diffusion in Computer Science Citation Networks , 2009, ICWSM.

[7]  J. E. Hirsch,et al.  The meaning of the h-index , 2014 .

[8]  W. Myers,et al.  Atypical Combinations and Scientific Impact , 2013 .

[9]  Albert-László Barabási,et al.  Quantifying Long-Term Scientific Impact , 2013, Science.

[10]  Lada A. Adamic,et al.  The Impact of Boundary Spanning Scholarly Publications and Patents , 2009, PloS one.

[11]  Jie Tang,et al.  A Discriminative Approach to Topic-Based Citation Recommendation , 2009, PAKDD.

[12]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[13]  M. Strathern ‘Improving ratings’: audit in the British University system , 1997, European Review.

[14]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[15]  Jiawei Han,et al.  Mining topic-level influence in heterogeneous networks , 2010, CIKM.

[16]  Dietmar Wolfram,et al.  Measuring Scholarly Impact: Methods and Practice , 2014 .

[17]  Brian D. Davison,et al.  Co-factorization machines: modeling user interests and predicting individual decisions in Twitter , 2013, WSDM.

[18]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[19]  Jie Tang,et al.  Citation count prediction: learning to estimate future citations for literature , 2011, CIKM '11.

[20]  Saverio Niccolini,et al.  A peek into the future: predicting the evolution of popularity in user generated content , 2013, WSDM.

[21]  Jon M. Kleinberg,et al.  Mechanisms for (mis)allocating scientific credit , 2011, STOC '11.

[22]  Yan Zhang,et al.  To better stand on the shoulder of giants , 2012, JCDL '12.

[23]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[24]  Padhraic Smyth,et al.  Dynamic Egocentric Models for Citation Networks , 2011, ICML.

[25]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[26]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[27]  Albert-László Barabási,et al.  Collective credit allocation in science , 2014, Proceedings of the National Academy of Sciences.

[28]  Konrad Paul Kording,et al.  Future impact: Predicting scientific success , 2012, Nature.

[29]  Albert-László Barabási,et al.  Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes , 2014, AAAI.

[30]  Rediet Abebe Can Cascades be Predicted? , 2014 .

[31]  Aristides Gionis,et al.  Estimating Number of Citations Using Author Reputation , 2007, SPIRE.

[32]  Juan-Zi Li,et al.  Expert Finding in a Social Network , 2007, DASFAA.

[33]  Daniel Jurafsky,et al.  Who should I cite: learning literature search models from citation behavior , 2010, CIKM.

[34]  Jussara M. Almeida,et al.  Using early view patterns to predict the popularity of youtube videos , 2013, WSDM.

[35]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[36]  Jure Leskovec,et al.  Citing for high impact , 2010, JCDL '10.

[37]  Guo Zhang,et al.  Content‐based citation analysis: The next generation of citation analysis , 2014, J. Assoc. Inf. Sci. Technol..

[38]  Jiawei Han,et al.  Citation Prediction in Heterogeneous Bibliographic Networks , 2012, SDM.

[39]  Jiawei Han,et al.  ClusCite: effective citation recommendation by information network-based clustering , 2014, KDD.

[40]  Nitesh V. Chawla,et al.  Will This Paper Increase Your h-index?: Scientific Impact Prediction , 2014, WSDM.

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Huaiyu Zhu On Information and Sufficiency , 1997 .