LinkBoost: A Novel Cost-Sensitive Boosting Framework for Community-Level Network Link Prediction

Link prediction is a challenging task due to the inherent skew ness of network data. Typical link prediction methods can be categorized as either local or global. Local methods consider the link structure in the immediate neighborhood of a node pair to determine the presence or absence of a link, whereas global methods utilize information from the whole network. This paper presents a community (cluster) level link prediction method without the need to explicitly identify the communities in a network. Specifically, a variable-cost loss function is defined to address the data skew ness problem. We provide theoretical proof that shows the equivalence between maximizing the well-known modularity measure used in community detection and minimizing a special case of the proposed loss function. As a result, any link prediction method designed to optimize the loss function would result in more links being predicted within a community than between communities. We design a boosting algorithm to minimize the loss function and present an approach to scale-up the algorithm by decomposing the network into smaller partitions and aggregating the weak learners constructed from each partition. Experimental results show that our proposed Link Boost algorithm consistently performs as good as or better than many existing methods when evaluated on 4 real-world network datasets.

[1]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[2]  Lise Getoor,et al.  Combining Collective Classification and Link Prediction , 2007 .

[3]  Cristopher Moore,et al.  Structural Inference of Hierarchies in Networks , 2006, SNA@ICML.

[4]  Purnamrita Sarkar,et al.  Theoretical Justification of Popular Link Prediction Heuristics , 2011, IJCAI.

[5]  Hisashi Kashima,et al.  A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[8]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[9]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[10]  Pavel Yu. Chebotarev,et al.  The Matrix-Forest Theorem and Measuring Relations in Small Social Groups , 2006, ArXiv.

[11]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[12]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[13]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[14]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[16]  Nuno Vasconcelos,et al.  Cost-Sensitive Boosting , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[18]  Tao Zhou,et al.  Scale-free networks without growth , 2008 .

[19]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[20]  Yoshihiro Yamanishi,et al.  propagation: A fast semisupervised learning algorithm for link prediction , 2009 .

[21]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[22]  Lada A. Adamic,et al.  How to search a social network , 2005, Soc. Networks.

[23]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[24]  Pang-Ning Tan,et al.  A matrix alignment approach for link prediction , 2008, 2008 19th International Conference on Pattern Recognition.