Mining advisor-advisee relationships from research publication networks

Information network contains abundant knowledge about relationships among people or entities. Unfortunately, such kind of knowledge is often hidden in a network where different kinds of relationships are not explicitly categorized. For example, in a research publication network, the advisor-advisee relationships among researchers are hidden in the coauthor network. Discovery of those relationships can benefit many interesting applications such as expert finding and research community analysis. In this paper, we take a computer science bibliographic network as an example, to analyze the roles of authors and to discover the likely advisor-advisee relationships. In particular, we propose a time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function. We further design an efficient learning algorithm to optimize the objective function. Based on that our model suggests and ranks probable advisors for every author. Experimental results show that the proposed approach infer advisor-advisee relationships efficiently and achieves a state-of-the-art accuracy (80-90%). We also apply the discovered advisor-advisee relationships to bole search, a specific expert finding task and empirical study shows that the search performance can be effectively improved (+4.09% by NDCG@5).

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Bo Wang,et al.  Expert2Bólè: From Expert Finding to Bólè Search , 2009 .

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[5]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[6]  Lise Getoor,et al.  Relationship Identification for Social Network Discovery , 2007, AAAI.

[7]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Felix Klein,et al.  Drawing Trees : How Many Circles to Use ? , 2009 .

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Y. N. Srikant,et al.  The Best Nurturers in Computer Science Research , 2004 .

[13]  Fabio Rinaldi,et al.  An environment for relation mining over richly annotated corpora: the case of GENIA , 2006, BMC Bioinformatics.

[14]  Harry B. Coonce Computer science and the mathematics genealogy project , 2004, SIGA.

[15]  John C. Hart,et al.  Drawing Trees: How Many Circles to Use? , 2008 .

[16]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[17]  Alessandro Moschitti,et al.  Generalized Framework for Syntax-Based Relation Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[19]  W. Marsden I and J , 2012 .

[20]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.