Node similarity distribution of complex networks and its application in link prediction

Over the years, quantifying the similarity of nodes has been a hot topic in complex networks, yet little has been known about the distributions of node-similarity. In this paper, we consider a typical measure of node-similarity called the common neighbor based similarity (CNS). By means of the generating function, we propose a general framework for calculating the CNS distributions of node sets in various complex networks. In particular, we show that for the Erd\"{o}s-R\'{e}nyi (ER) random network, the CNS distribution of node sets of any particular size obeys the Poisson law. We also connect the node-similarity distribution to the link prediction problem. We found that the performance of link prediction depends solely on the CNS distributions of the connected and unconnected node pairs in the network. Furthermore, we derive theoretical solutions of two key evaluation metrics in link prediction: i) precision and ii) area under the receiver operating characteristic curve (AUC). We show that for any link prediction method, if the similarity distributions of the connected and unconnected node pairs are identical, the AUC will be $0.5$. The theoretical solutions are elegant alternatives of the traditional experimental evaluation methods with nevertheless much lower computational cost.

[1]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[2]  Mohammad Khansari,et al.  Predicting brain network changes in Alzheimer's disease with link prediction algorithms. , 2017, Molecular bioSystems.

[3]  Yicheng Zhang,et al.  Structure-oriented prediction in complex networks , 2018 .

[4]  Jun Li,et al.  A link prediction approach for item recommendation with complex number , 2015, Knowl. Based Syst..

[5]  Qi Zhang,et al.  Measure the structure similarity of nodes in complex networks based on relative entropy , 2018 .

[6]  Alessandro Vespignani,et al.  Twenty years of network science , 2018, Nature.

[7]  Naoki Masuda,et al.  A Guide to Temporal Networks , 2016, Series on Complexity Science.

[8]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Fei Tan,et al.  Link Prediction in Complex Networks: A Mutual Information Perspective , 2014, PloS one.

[10]  Yixin Chen,et al.  Beyond Link Prediction: Predicting Hyperlinks in Adjacency Space , 2018, AAAI.

[11]  Wiley Interscience Journal of the American Society for Information Science and Technology , 2013 .

[12]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[13]  A. Barabasi,et al.  Network link prediction by global silencing of indirect correlations , 2013, Nature Biotechnology.

[14]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  Sune Lehmann,et al.  Measure of Node Similarity in Multilayer Networks , 2016, PloS one.

[17]  Marián Boguñá,et al.  Popularity versus similarity in growing networks , 2011, Nature.

[18]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[19]  Ginestra Bianconi,et al.  Multilayer Networks , 2018, Oxford Scholarship Online.

[20]  Yongxiang Xia,et al.  An information-theoretic model for link prediction in complex networks , 2015, Scientific Reports.

[21]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[22]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[23]  Jon M. Kleinberg,et al.  Simplicial closure and higher-order link prediction , 2018, Proceedings of the National Academy of Sciences.

[24]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[25]  Simone Daminelli,et al.  Common neighbours and the local-community-paradigm for topological link prediction in bipartite networks , 2015, ArXiv.

[26]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[27]  Qi Xuan,et al.  Link Weight Prediction Using Supervised Learning Methods and Its Application to Yelp Layered Network , 2018, IEEE Transactions on Knowledge and Data Engineering.

[28]  Jian Yang,et al.  Link prediction based on path entropy , 2015, ArXiv.

[29]  Charu C. Aggarwal,et al.  An Ensemble Approach to Link Prediction , 2017, IEEE Transactions on Knowledge and Data Engineering.

[30]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.

[31]  Marc Barthelemy,et al.  Morphogenesis of Spatial Networks , 2017 .

[32]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  John E. Hopcroft,et al.  Using community information to improve the precision of link prediction methods , 2012, WWW.

[34]  Fernando Berzal Galiano,et al.  A Survey of Link Prediction in Complex Networks , 2016, ACM Comput. Surv..

[35]  David Liben-Nowell,et al.  An algorithmic approach to social networks , 2005 .

[36]  Yanbing Liu,et al.  3-HBP: A Three-Level Hidden Bayesian Link Prediction Model in Social Networks , 2018, IEEE Transactions on Computational Social Systems.

[37]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[38]  Liang Tang,et al.  Scaling Up Markov Logic Probabilistic Inference for Social Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  B. Bollobás The evolution of random graphs , 1984 .

[40]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .