Predicting Long-Term Scientific Impact Based on Multi-Field Feature Extraction

Nowadays, there have been many studies on evaluating the scientific impact of scholars. However, we still lack effective methods to predict long-term impact, especially 10 years in the future. Therefore, we propose a long-term scientific impact prediction model based on multi-field feature extraction. The workflow of our proposed model consists of feature engineering and model ensemble. In feature engineering, we extract attribute feature, time-series feature, and heterogeneous network feature based on three different fields. Moreover, when extracting heterogeneous network feature, we propose a scientific impact evaluation method based on heterogeneous academic network, which considers both the time of publication and author order factors. In the model ensemble, we adjust the basic model and noise model to the different training set to make full use of the information from the original dataset. The experiment results demonstrate that the proposed model can stably improve the accuracy of scholars’ scientific impact prediction, and it also offers a prediction pattern for long-term prediction problem.

[1]  William H. Walters,et al.  Citation-Based Journal Rankings: Key Questions, Metrics, and Data Sources , 2017, IEEE Access.

[2]  Igor Podlubny,et al.  Comparison of scientific impact expressed by the number of citations in different fields of science , 2004, Scientometrics.

[3]  Nitesh V. Chawla,et al.  Can Scientific Impact Be Predicted? , 2016, IEEE Transactions on Big Data.

[4]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[5]  R. Rousseau,et al.  The R- and AR-indices: Complementing the h-index , 2007 .

[6]  Albert-László Barabási,et al.  Quantifying Long-Term Scientific Impact , 2013, Science.

[7]  Bo Xu,et al.  PePSI: Personalized Prediction of Scholars’ Impact in Heterogeneous Temporal Academic Networks , 2018, IEEE Access.

[8]  Erik Brynjolfsson,et al.  Network Analysis for Predicting Academic Impact , 2013, ICIS.

[9]  Lin Wang,et al.  Characterizing the dynamics underlying global spread of epidemics , 2018, Nature Communications.

[10]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[11]  Weiwei Lin,et al.  An Ensemble Random Forest Algorithm for Insurance Big Data Analysis , 2017, IEEE Access.

[12]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[13]  Bonnie Stewart,et al.  Open to influence: what counts as academic influence in scholarly networked Twitter participation , 2015 .

[14]  E. Garfield Citation Indexing for Studying Science , 1970, Nature.

[15]  Feng Xia,et al.  Identifying Anomalous Citations for Objective Evaluation of Scholarly Article Impact , 2016, PloS one.

[16]  K. J. Ray Liu,et al.  A data analytic approach to quantifying scientific impact , 2016, J. Informetrics.

[17]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[18]  H. Stanley,et al.  The science of science: from the perspective of complex systems , 2017 .

[19]  Konrad Paul Kording,et al.  Future impact: Predicting scientific success , 2012, Nature.

[20]  Jie Tang,et al.  Citation count prediction: learning to estimate future citations for literature , 2011, CIKM '11.

[21]  Albert-László Barabási,et al.  Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes , 2014, AAAI.

[22]  Oren Etzioni,et al.  Learning to Predict Citation-Based Impact Measures , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[23]  Xiaojun Wan,et al.  Are all literature citations equally important? Automatic citation strength estimation and its applications , 2014, J. Assoc. Inf. Sci. Technol..

[24]  Nitesh V. Chawla,et al.  Will This Paper Increase Your h-index? , 2015, ECML/PKDD.

[25]  Feng Xia,et al.  Exploring time factors in measuring the scientific impact of scholars , 2017, Scientometrics.

[26]  Yoh Iwasa,et al.  Exploiting a cognitive bias promotes cooperation in social dilemma experiments , 2018, Nature Communications.

[27]  Feng Xia,et al.  AIRank: Author Impact Ranking through Positions in Collaboration Networks , 2018, Complex..

[28]  Niloy Ganguly,et al.  Towards a stratified learning approach to predict future citation counts , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[29]  Xiaoyu Wang,et al.  A network-based and multi-parameter model for finding influential authors , 2014, J. Informetrics.

[30]  Oren Etzioni,et al.  Identifying Meaningful Citations , 2015, AAAI Workshop: Scholarly Big Data.

[31]  Dan Yang,et al.  E-Index—A Bibliometric Index of Research Efficiency , 2018, IEEE Access.

[32]  Xiaoming Zhang,et al.  Coranking the Future Influence of Multiobjects in Bibliographic Network Through Mutual Reinforcement , 2016, ACM Trans. Intell. Syst. Technol..

[33]  Reda Alhajj,et al.  A Supervised Learning Method for Prediction Citation Count of Scientists in Citation Networks , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[34]  Sascha Lange,et al.  Predicting Time Series with Space-Time Convolutional and Recurrent Neural Networks , 2017, ESANN.