Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics

Currently, we are experiencing a rapid growth of the number of social-based online systems. The availability of the vast amounts of data gathered in those systems brings new challenges that we face when trying to analyse it. One of the intensively researched topics is the prediction of social connections between users. Although a lot of effort has been made to develop new prediction approaches, the existing methods are not comprehensively analysed. In this paper we investigate the correlation between network metrics and accuracy of different prediction methods. We selected six time-stamped real-world social networks and ten most widely used link prediction methods. The results of the experiments show that the performance of some methods has a strong correlation with certain network metrics. We managed to distinguish "prediction friendly" networks, for which most of the prediction methods give good performance, as well as "prediction unfriendly" networks, for which most of the methods result in high prediction error. Correlation analysis between network metrics and prediction accuracy of prediction methods may form the basis of a metalearning system where based on network characteristics it will be able to recommend the right prediction method for a given network.

[1]  Frank Harary,et al.  Graph Theory , 2016 .

[2]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[3]  Alessandro Vespignani,et al.  Epidemic spreading in scale-free networks. , 2000, Physical review letters.

[4]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[5]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[6]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[7]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[8]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[10]  Zhen Liu,et al.  Link prediction in complex networks: A local naïve Bayes model , 2011, ArXiv.

[11]  T. Zhou,et al.  Effective and Efficient Similarity Index for Link Prediction of Complex Networks , 2009, 0905.3558.

[12]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[13]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[14]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[15]  Bo Yang,et al.  Graph-based features for supervised link prediction , 2011, The 2011 International Joint Conference on Neural Networks.

[16]  Christos Faloutsos,et al.  Epidemic spreading in real networks: an eigenvalue viewpoint , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[17]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[18]  Jérôme Kunegis,et al.  Fairness on the web: alternatives to the power law , 2012, WebSci '12.

[19]  Katarzyna Musial,et al.  Molecular model of dynamic social network based on e-mail communication , 2013, Social Network Analysis and Mining.

[20]  Guido Caldarelli,et al.  The Evolution of Complex Networks: A New Framework , 2012, ArXiv.

[21]  Aleksander Zgrzywa,et al.  Control and Cybernetics Evaluation of Node Position Based on Email Communication * , 2022 .

[22]  Paul Erdös,et al.  On random graphs, I , 1959 .

[23]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[24]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[25]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Krishna P. Gummadi,et al.  Growth of the flickr social network , 2008, WOSN '08.

[27]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[28]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[29]  Katarzyna Musial,et al.  Link Prediction Based on Subgraph Evolution in Dynamic Social Networks , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[30]  Ricardo B. C. Prud Supervised Link Prediction in Weighted Networks , 2011 .

[31]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[32]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[33]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[34]  Peter Druschel,et al.  Online social networks: measurement, analysis, and applications to distributed information systems , 2009 .

[35]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[37]  N. Biggs,et al.  Graph Theory 1736-1936 , 1976 .

[38]  Ferenc Molnár Link Prediction Analysis in the Wikipedia Collaboration Graph , 2011 .

[39]  Tore Opsahl Triadic closure in two-mode networks: Redefining the global and local clustering coefficients , 2013, Soc. Networks.

[40]  Robert B. Russell,et al.  InterPreTS: protein Interaction Prediction through Tertiary Structure , 2003, Bioinform..

[41]  Zohreh Azimifar,et al.  Degrees of Separation in Social Networks , 2011, SOCS.

[42]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[43]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[44]  Rami Puzis,et al.  Link Prediction in Social Networks Using Computationally Efficient Topological Features , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[45]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[46]  Ole J. Mengshoel,et al.  Will We Connect Again? Machine Learning for Link Prediction in Mobile Social Networks , 2013, MLG 2013.

[47]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[48]  Duncan J. Watts,et al.  The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .

[49]  Thomas Hofmann,et al.  Stochastic Relational Models for Discriminative Link Prediction , 2007 .

[50]  Christos Faloutsos,et al.  Epidemic thresholds in real networks , 2008, TSEC.

[51]  B. Bollobás The evolution of random graphs , 1984 .

[52]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[53]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[54]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[55]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[56]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.