The Child is Father of the Man: Foresee the Success at the Early Stage

Understanding the dynamic mechanisms that drive the high-impact scientific work (e.g., research papers, patents) is a long-debated research topic and has many important implications, ranging from personal career development and recruitment search, to the jurisdiction of research resources. Recent advances in characterizing and modeling scientific success have made it possible to forecast the long-term impact of scientific work, where data mining techniques, supervised learning in particular, play an essential role. Despite much progress, several key algorithmic challenges in relation to predicting long-term scientific impact have largely remained open. In this paper, we propose a joint predictive model to forecast the long-term scientific impact at the early stage, which simultaneously addresses a number of these open challenges, including the scholarly feature design, the non-linearity, the domain-heterogeneity and dynamics. In particular, we formulate it as a regularized optimization problem and propose effective and scalable algorithms to solve it. We perform extensive empirical evaluations on large, real scholarly data sets to validate the effectiveness and the efficiency of our method.

[1]  Ke Xu,et al.  On popularity prediction of videos shared in online social networks , 2013, CIKM.

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[4]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[5]  Yizhou Sun,et al.  Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation , 2014, CIKM.

[6]  Jiawei Han,et al.  Citation Prediction in Heterogeneous Bibliographic Networks , 2012, SDM.

[7]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[8]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[9]  W. Myers,et al.  Atypical Combinations and Scientific Impact , 2013 .

[10]  Yan Zhang,et al.  To better stand on the shoulder of giants , 2012, JCDL '12.

[11]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[12]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[13]  Nitesh V. Chawla,et al.  Will This Paper Increase Your h-index?: Scientific Impact Prediction , 2014, WSDM.

[14]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[15]  Benjamin F. Jones,et al.  Supporting Online Material Materials and Methods Figs. S1 to S3 References the Increasing Dominance of Teams in Production of Knowledge , 2022 .

[16]  Hui Xiong,et al.  Predicting the Popularity of Online Serials with Autoregressive Models , 2014, CIKM.

[17]  Hiep Phuc Luong,et al.  Concept-Based Document Recommendations for CiteSeer Authors , 2008, AH.

[18]  Aristides Gionis,et al.  Estimating Number of Citations Using Author Reputation , 2007, SPIRE.

[19]  Ali Cakmak,et al.  High Impact Academic Paper Prediction Using Temporal and Topological Features , 2014, CIKM.

[20]  Albert-László Barabási,et al.  Quantifying Long-Term Scientific Impact , 2013, Science.

[21]  Hanghang Tong,et al.  Cheetah: Fast Graph Kernel Tracking on Dynamic Graphs , 2015, SDM.

[22]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[23]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[24]  Feng Xu,et al.  Predicting long-term impact of CQA posts: a comprehensive viewpoint , 2014, KDD.

[25]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[26]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[27]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[28]  Jie Tang,et al.  Citation count prediction: learning to estimate future citations for literature , 2011, CIKM '11.

[29]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[30]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[31]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[32]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[33]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[34]  Arthur H. King 'The child is father of the man'. , 1979, British dental journal.

[35]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.