Predicting Citation Counts Using Text and Graph Mining

As the volume of scientific literature grows faster it becomes more difficult for researchers to identify promising papers that are likely to become influential in their field. We study the problem of predicting future citation counts of papers given information available at the time of publication (five years forward in our pilot study). We apply machine learning techniques on a dataset of millions of academic papers from several research domains to identify predictive features including venue reputation, authors and institutions, citation networks and content measures. We identify how these features are differentially predictive in various domains and identify possible reasons where citation behaviors might lead to these differences.

[1]  Jöran Beel,et al.  Google Scholar's ranking algorithm: The impact of citation counts (An empirical study) , 2009, 2009 Third International Conference on Research Challenges in Information Science.

[2]  A. Kulkarni,et al.  Characteristics Associated with Citation Rate of the Medical Literature , 2007, PloS one.

[3]  Herbert Van de Sompel,et al.  Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL '12, Washington, DC, USA, June 10-14, 2012 , 2012, JCDL.

[4]  R. Wears,et al.  Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. , 2002, JAMA.

[5]  Jure Leskovec,et al.  Citing for high impact , 2010, JCDL '10.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Yan Zhang,et al.  To better stand on the shoulder of giants , 2012, JCDL '12.

[8]  Aristides Gionis,et al.  Estimating Number of Citations Using Author Reputation , 2007, SPIRE.

[9]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[10]  Mike Thelwall,et al.  Determinants of research citation impact in nanoscience and nanotechnology , 2013, J. Assoc. Inf. Sci. Technol..

[11]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[12]  Mônica G. Campiteli,et al.  An index to quantify an individual's scientific research valid across disciplines , 2005 .

[13]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[14]  Lawrence D. Fu,et al.  Models for Predicting and Explaining Citation Count of Biomedical Articles , 2008, AMIA.