Link Prediction in Large Networks by Comparing the Global View of Nodes in the Network

Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, "global topological similarity" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.

[1]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[2]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[3]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[4]  Nitesh V. Chawla,et al.  Predicting Links in Multi-relational and Heterogeneous Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  Gita Reese Sukthankar,et al.  Link prediction in multi-relational collaboration networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[6]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[7]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[8]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jianhua Ruan,et al.  A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity , 2013, Bioinform..

[10]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[12]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[13]  Noah M. Daniels,et al.  Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks , 2013, PloS one.

[14]  Soumen Chakrabarti,et al.  Learning random walks to rank nodes in graphs , 2007, ICML '07.

[15]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[16]  Mehmet Koyutürk,et al.  Vavien: An Algorithm for Prioritizing Candidate Disease Genes Based on Topological Similarity of Proteins in Interaction Networks , 2011, J. Comput. Biol..

[17]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[18]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[19]  Christos Faloutsos,et al.  GMine: a system for scalable, interactive graph visualization and mining , 2006, VLDB.

[20]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[21]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[22]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[23]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[24]  FoussFrancois,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007 .

[25]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[26]  Mehmet Koyutürk,et al.  DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization , 2011, BioData Mining.