Link Prediction in Sparse Networks by Incidence Matrix Factorization

Link prediction plays an important role in multiple areas of artificial intelligence, including social network analysis and bioinformatics; however, it is often negatively affected by the data sparsity problem. In this paper, we present and validate our hypothesis, i.e., for sparse networks, incidence matrix factorization (IMF) could perform better than adjacency matrix factorization (AMF), the latter used in many previous studies. A key observation supporting our hypothesis here is that IMF models a partially observed graph more accurately than AMF. Unfortunately, a technical challenge we face in validating our hypothesis is that there is not an obvious method for making link prediction using a factorized incidence matrix, unlike the AMF approach. To this end, we developed an optimization-based link prediction method. Then we have conducted thorough experiments using both synthetic and real-world datasets to investigate the relationship between the sparsity of a network and the predictive performance of the aforementioned two factorization approaches. Our experimental results show that IMF performed better than AMF as networks became sparser, which validates our hypothesis.

[1]  Jérôme Kunegis,et al.  Fairness on the web: alternatives to the power law , 2012, WebSci '12.

[2]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[3]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[4]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[5]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[6]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[7]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[10]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[12]  Danushka Bollegala,et al.  Multinomial Relation Prediction in Social Data: A Dimension Reduction Approach , 2012, AAAI.

[13]  Jianping Li,et al.  Link Prediction via Convex Nonnegative Matrix Factorization on Multiscale Blocks , 2014, J. Appl. Math..

[14]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[15]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[16]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[17]  Lise Getoor,et al.  Link mining: a new data mining challenge , 2003, SKDD.

[18]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[19]  Jérôme Kunegis,et al.  Learning spectral graph transformations for link prediction , 2009, ICML '09.

[20]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[21]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[23]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[24]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..