Scaling up Link Prediction with Ensembles

A network with $n$ nodes contains O(n2) possible links. Even for networks of modest size, it is often difficult to evaluate all pairwise possibilities for links in a meaningful way. Furthermore, even though link prediction is closely related to missing value estimation problems, such as collaborative filtering, it is often difficult to use sophisticated models such as latent factor methods because of their computational complexity over very large networks. Due to this computational complexity, most known link prediction methods are designed for evaluating the link propensity over a specified subset of links, rather than for performing a global search over the entire networks. In practice, however, it is essential to perform an exhaustive search over the entire networks. In this paper, we propose an ensemble enabled approach to scaling up link prediction, which is able to decompose traditional link prediction problems into subproblems of smaller size. These subproblems are each solved with the use of latent factor models, which can be effectively implemented over networks of modest size. Furthermore, the ensemble enabled approach has several advantages in terms of performance. We show the advantage of using ensemble-based latent factor models with experiments on very large networks. Experimental results demonstrate the effectiveness and scalability of our approach.

[1]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007 .

[2]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[3]  Nitesh V. Chawla,et al.  Vertex collocation profiles: subgraph counting for link analysis and prediction , 2012, WWW.

[4]  Jure Leskovec,et al.  Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia , 2015, WWW.

[5]  Jérôme Kunegis,et al.  Learning spectral graph transformations for link prediction , 2009, ICML '09.

[6]  Nicola Barbieri,et al.  Who to follow and why: link prediction with explanations , 2014, KDD.

[7]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[8]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[11]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[12]  W. Art Chaovalitwongse,et al.  A novel link prediction approach for scale-free networks , 2014, WWW.

[13]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[14]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[15]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[16]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[17]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[18]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[19]  Janardhan Rao Doppa,et al.  Chance-Constrained Programs for Link Prediction , 2009 .

[20]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[21]  Nitesh V. Chawla,et al.  Predicting Links in Multi-relational and Heterogeneous Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[22]  Charu C. Aggarwal,et al.  Link prediction across networks by biased cross-network sampling , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[23]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[24]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[25]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[26]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[27]  Lise Getoor,et al.  Combining Collective Classification and Link Prediction , 2007 .

[28]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[29]  Jun Hong,et al.  Using Markov models for web site link prediction , 2002, HYPERTEXT '02.

[30]  M. de Rijke,et al.  Discovering missing links in Wikipedia , 2005, LinkKDD '05.

[31]  Charu C. Aggarwal,et al.  Negative Link Prediction in Social Media , 2014, WSDM.

[32]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[33]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[34]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[35]  Thomas Hofmann,et al.  Stochastic Relational Models for Discriminative Link Prediction , 2007 .

[36]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[37]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[38]  Philip S. Yu,et al.  Co-clustering by block value decomposition , 2005, KDD '05.

[39]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[40]  Charu C. Aggarwal,et al.  Mining massively incomplete data sets by conceptual reconstruction , 2001, KDD '01.

[41]  Pasi Fränti,et al.  Web Data Mining , 2009, Encyclopedia of Database Systems.

[42]  Jie Tang,et al.  Inferring social ties across heterogenous networks , 2012, WSDM '12.

[43]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.