Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction

While scientific collaboration is critical for a scholar, some collaborators can be more significant than others, e.g., lifetime collaborators. It has been shown that lifetime collaborators are more influential on a scholar’s academic performance. However, little research has been done on investigating predicting such special relationships in academic networks. To this end, we propose Scholar2vec, a novel neural network embedding for representing scholar profiles. First, our approach creates scholars’ research interest vector from textual information, such as demographics, research, and influence. After bridging research interests with a collaboration network, vector representations of scholars can be gained with graph learning. Meanwhile, since scholars are occupied with various attributes, we propose to incorporate four types of scholar attributes for learning scholar vectors. Finally, the early-stage similarity sequence based on Scholar2vec is used to predict lifetime collaborators with machine learning methods. Extensive experiments on two real-world datasets show that Scholar2vec outperforms state-of-the-art methods in lifetime collaborator prediction. Our work presents a new way to measure the similarity between two scholars by vector representation, which tackles the knowledge between network embedding and academic relationship mining.

[1]  Esteban Moro Egido,et al.  Temporal patterns behind the strength of persistent ties , 2017, EPJ Data Science.

[2]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[3]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[4]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Chengqi Zhang,et al.  Tri-Party Deep Network Representation , 2016, IJCAI.

[6]  Huan Liu,et al.  Attributed Network Embedding for Learning in a Dynamic Environment , 2017, CIKM.

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Xiaoming Zhang,et al.  From Properties to Links: Deep Network Embedding on Incomplete Graphs , 2017, CIKM.

[9]  Brian D. Davison,et al.  Recommendation in Academia: A joint multi-relational model , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[10]  Vikram Pudi,et al.  Paper2vec: Combining Graph and Text Information for Scientific Paper Representation , 2017, ECIR.

[11]  Xiao Huang,et al.  Accelerated Attributed Network Embedding , 2017, SDM.

[12]  Feng Xia,et al.  Web of Scholars: A Scholar Knowledge Graph , 2020, SIGIR.

[13]  Feng Xia,et al.  Scientific collaboration patterns vary with scholars’ academic ages , 2017, Scientometrics.

[14]  Padhraic Smyth,et al.  Prediction and ranking algorithms for event-based network data , 2005, SKDD.

[15]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[16]  Fernando Berzal Galiano,et al.  A Survey of Link Prediction in Complex Networks , 2016, ACM Comput. Surv..

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Xing Zhou,et al.  Collaborator recommendation in heterogeneous bibliographic networks using random walks , 2017, Information Retrieval Journal.

[19]  Feng Xia,et al.  Early-stage reciprocity in sustainable scientific collaboration , 2020, J. Informetrics.

[20]  Xiangjie Kong,et al.  Turing Number: How Far Are You to A. M. Turing Award? , 2021, ArXiv.

[21]  M. Coccia,et al.  Evolution and convergence of the patterns of international scientific collaboration , 2016, Proceedings of the National Academy of Sciences.

[22]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[23]  Alexander Michael Petersen,et al.  Quantifying the impact of weak, strong, and super ties in scientific careers , 2015, Proceedings of the National Academy of Sciences.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[26]  Yiming Zhao,et al.  Measuring the stability of scientific collaboration , 2018, Scientometrics.

[27]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[28]  A. Barabasi,et al.  Quantifying the evolution of individual scientific impact , 2016, Science.

[29]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[30]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[31]  Feng Xia,et al.  Big Scholarly Data: A Survey , 2017, IEEE Transactions on Big Data.

[32]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[33]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[34]  Meng Wang,et al.  Exploring dynamic research interest and academic influence for scientific collaborator recommendation , 2017, Scientometrics.

[35]  Chengqi Zhang,et al.  Attributed network embedding via subspace discovery , 2019, Data Mining and Knowledge Discovery.

[36]  Feng Xia,et al.  MVCWalker: Random Walk-Based Most Valuable Collaborators Recommendation Exploiting Academic Factors , 2014, IEEE Transactions on Emerging Topics in Computing.

[37]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[39]  Jimeng Sun,et al.  Cross-domain collaboration recommendation , 2012, KDD.

[40]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[41]  Ying Ding,et al.  Understanding scientific collaboration: Homophily, transitivity, and preferential attachment , 2018, J. Assoc. Inf. Sci. Technol..

[42]  Mikko Kivelä,et al.  Generalizations of the clustering coefficient to weighted complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Feng Xia,et al.  Random Walks: A Review of Algorithms and Applications , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.

[44]  Jiaying Liu,et al.  VOPRec: Vector Representation Learning of Papers with Text Information and Structural Identity for Recommendation , 2021, IEEE Transactions on Emerging Topics in Computing.

[45]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[46]  Diane H. Sonnenwald,et al.  Scientific collaboration , 2007, Annual Review of Information Science and Technology.

[47]  Han Tian,et al.  Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation , 2017, ArXiv.

[48]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[49]  Chun-Hua Tsai,et al.  Tracing and Predicting Collaboration for Junior Scholars , 2016, WWW.

[50]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.

[51]  Feng Xia,et al.  CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting , 2019, World Wide Web.

[52]  Feng Xia,et al.  Not Every Couple Is a Pair: A Supervised Approach for Lifetime Collaborator Identification , 2019, PACIS.