A Graph Convolutional Neural Network based Framework for Estimating Future Citations Count of Research Articles

Scientific publications play a vital role in the career of a researcher. However, some articles become more popular than others among the research community and subsequently drive future research directions. One of the indicative signs of popular articles is the number of citations an article receives. The citation count, which is also the basis with various other metrics, such as the journal impact factor score, the h-index, is an essential measure for assessing a scientific paper’s quality. In this work, we proposed a Graph Convolutional Network (GCN) based framework for estimating future research publication citations for both the short-term (1-year) and long-term (for 5-years and 10-years) duration. We have tested our proposed approach over the AMiner dataset, specifically on research articles from the computer science domain, consisting of more than 0.8 million articles. By exploring both conventional and graphbased features, we have compared machine learning algorithms (Linear Regression, Random Forest, XGBoost, and Deep Neural Networks) as baseline methods with our GCN-based approach, which outperforms baseline algorithms in terms of error rates and R value, indicating the robustness of the model.

[1]  Ingo Scholtes,et al.  Predicting scientific success based on coauthorship networks , 2014, EPJ Data Science.

[2]  Christopher McCarty,et al.  Predicting author h-index using characteristics of the co-author network , 2013, Scientometrics.

[3]  K. A. McKibbon,et al.  Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study , 2008, BMJ : British Medical Journal.

[4]  Mike Thelwall,et al.  Could scientists use Altmetric.com scores to predict longer term citation counts? , 2018, J. Informetrics.

[5]  George Mohler,et al.  Forecasting Retweet Count during Elections Using Graph Convolution Neural Networks , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[6]  Michael J. Lovaglia,et al.  Predicting citations to journal articles: The ideal number of references , 1991 .

[7]  A. Kulkarni,et al.  Characteristics Associated with Citation Rate of the Medical Literature , 2007, PloS one.

[8]  Joan Bruna,et al.  Community Detection with Graph Neural Networks , 2017 .

[9]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[10]  Daniel Jurafsky,et al.  Who should I cite: learning literature search models from citation behavior , 2010, CIKM.

[11]  Jinyin Chen,et al.  GC-LSTM: graph convolution embedded LSTM for dynamic network link prediction , 2018, Applied Intelligence.

[12]  R. Wears,et al.  Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. , 2002, JAMA.

[13]  Xiaomei Bai,et al.  Predicting the citations of scholarly paper , 2019, J. Informetrics.

[14]  Aristides Gionis,et al.  Estimating Number of Citations Using Author Reputation , 2007, SPIRE.

[15]  Daniel McNamara,et al.  Predicting High Impact Academic Papers Using Citation Network Features , 2013, PAKDD Workshops.

[16]  Babak Sohrabi,et al.  The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts , 2016, Scientometrics.

[17]  K. J. Ray Liu,et al.  A data analytic approach to quantifying scientific impact , 2016, J. Informetrics.

[18]  Sadegh Aliakbary,et al.  Predicting citation counts based on deep neural network learning techniques , 2018, J. Informetrics.

[19]  Liaojun Pang,et al.  Determining scientific impact using a collaboration index , 2013, Proceedings of the National Academy of Sciences.

[20]  Animesh Mukherjee,et al.  Understanding the Impact of Early Citers on Long-Term Scientific Impact , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[21]  David D. Jensen,et al.  Exploiting relational structure to understand publication patterns in high-energy physics , 2003, SKDD.

[22]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[23]  Barbara J. Robson,et al.  Can we predict citation counts of environmental modelling papers? Fourteen bibliographic and categorical variables predict less than 30% of the variability in citation counts , 2016, Environ. Model. Softw..

[24]  Peter Klimek,et al.  Successful fish go with the flow: citation impact prediction based on centrality measures for term–document networks , 2016, Scientometrics.

[25]  Jie Tang,et al.  Citation count prediction: learning to estimate future citations for literature , 2011, CIKM '11.

[26]  Jure Leskovec,et al.  Citing for high impact , 2010, JCDL '10.

[27]  Susan T. Dumais,et al.  Predicting Citation Counts Using Text and Graph Mining , 2013 .

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[30]  Changsheng Li,et al.  On Modeling and Predicting Individual Paper Citation Count over Time , 2016, IJCAI.

[31]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[32]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[33]  Nitesh V. Chawla,et al.  Can Scientific Impact Be Predicted? , 2016, IEEE Transactions on Big Data.

[34]  Yan Zhang,et al.  To better stand on the shoulder of giants , 2012, JCDL '12.

[35]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[36]  M. E. J. Newman,et al.  The first-mover advantage in scientific publication , 2008, 0809.0522.