Accurate link prediction method based on path length between a pair of unlinked nodes and their degree

The link prediction problem has received much attention since the beginnings of social and behavioral sciences. For instance, social networks such as Facebook, Twitter, and LinkendIN change enduringly as new connections appear in the graph. For these networks, one of the biggest challenges is to find accurately the best recommendations to the users. Within the meaning of the graph, the main objective of the link prediction problem is to predict the upcoming links from the actual state of a graph. Link prediction methods use some score functions, such as Jaccard coefficient, Katz index, and Adamic Adar metric, to measure the probability of adding the links to the network. These metrics are widely used in various applications due to their simplicity and their interpretability; however, the majority of them are designed for a specific domain. Social networks become very large with a several number of users that are connected with different kinds of links. Predicting those links is still a challenging task, as we need to find the best way to perform predictions as accurate as possible. Along this way, we extend our previous work is (Jibouni et al. in 2018 6th international conference on wireless networks and mobile communications (WINCOM). IEEE, pp 1–6, 2018) where we have proposed a new node similarity measure based on the path depth between the source and destination nodes and their degrees. The used topological features are very easy to compute and very effective in solving the link prediction problem. In addition, we verify the impact of the path length l on the method performance and we show that the proposed method provides more accurate recommendations by using the path length 2 and 3. Then, we compare 13 state-of-the-art methods against the proposed method in terms of their prediction performance using the area under curve. The results on five instances of social networks show the efficiency of the proposed method in providing accurate recommendations. Furthermore, we consider machine learning techniques such as K-nearest neighbors, logistic regression, artificial neural network, decision tree, random forest, support vector machine to solve the link prediction problem as a binary classification task. The results confirm the significant accuracy improvement that can be achieved using the proposed metric.

[1]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[3]  Linyuan Lu,et al.  Uncovering missing links with cold ends , 2011, ArXiv.

[4]  Matthieu De Beule,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 1999 .

[5]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[7]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[9]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[10]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Fernando Berzal Galiano,et al.  A Survey of Link Prediction in Complex Networks , 2016, ACM Comput. Surv..

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[14]  T. Snijders The degree variance: An index of graph heterogeneity , 1981 .

[15]  Francesco Folino,et al.  Link Prediction Approaches for Disease Networks , 2012, ITBAM.

[16]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[17]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[18]  Hui Chen,et al.  A literature survey on smart cities , 2015, Science China Information Sciences.

[19]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[20]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[21]  Zheng Zhengzhong Link prediction using semi-supervised learning , 2012 .

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[24]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[25]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[26]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[27]  Ahmed Hammouch,et al.  A novel parameter free approach for link prediction , 2018, 2018 6th International Conference on Wireless Networks and Mobile Communications (WINCOM).

[28]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[29]  Armelle Brun,et al.  Densifying a behavioral recommender system by social networks link prediction methods , 2011, Social Network Analysis and Mining.

[30]  Peng Wang,et al.  Link prediction in social networks: the state-of-the-art , 2014, Science China Information Sciences.

[31]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[32]  V Latora,et al.  Efficient behavior of small-world networks. , 2001, Physical review letters.

[33]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[34]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[35]  Yang Xu,et al.  Data exchange similarity based on flow field for link prediction problem , 2016, 2016 Sixth International Conference on Information Science and Technology (ICIST).

[36]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[37]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .