Prediction in Heterogeneous Bibliographic Networks

To reveal information hiding in link space of bibliographical networks, link analysis has been studied from different perspectives in recent years. In this paper, we address a novel problem namely citation prediction, that is: given information about authors, topics, target publication venues as well as time of certain research paper, finding and predicting the citation relationship between a query paper and a set of previous papers. Considering the gigantic size of relevant papers, the loosely connected citation network structure as well as the highly skewed citation relation distribution, citation prediction is more challenging than other link prediction problems which have been studied before. By building a meta-path based prediction model on a topic discriminative search space, we here propose a two-phase citation probability learning approach, in order to predict citation relationship effectively and efficiently. Experiments are performed on real-world dataset with comprehensive measurements, which demonstrate that our framework has substantial advantages over commonly used link prediction approaches in predicting citation relations in bibliographical networks.

[1]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[2]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[4]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[5]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[6]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[7]  LaoNi,et al.  Relational retrieval using a combination of path-constrained random walks , 2010 .

[8]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[9]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[10]  Jie Tang,et al.  Citation count prediction: learning to estimate future citations for literature , 2011, CIKM '11.

[11]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[12]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[13]  M. Narasimha Murty,et al.  Citation prediction using time series approach KDD Cup 2003 (task 1) , 2003, SKDD.

[14]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[15]  Jiawei Han,et al.  Geo-Friends Recommendation in GPS-based Cyber-physical Social Network , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[16]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[17]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[18]  Chris H. Q. Ding,et al.  Collaborative Filtering: Weighted Nonnegative Matrix Factorization Incorporating User and Item Graphs , 2010, SDM.

[19]  Jiawei Han,et al.  Towards feature selection in network , 2011, CIKM '11.