Evaluating link prediction methods

Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.

[1]  Jie Tang,et al.  Link Prediction of Social Networks Based on Weighted Proximity Measures , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[2]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[3]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[4]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[5]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[6]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[7]  Bradford A. Hawkins,et al.  EFFECTS OF SAMPLING EFFORT ON CHARACTERIZATION OF FOOD-WEB STRUCTURE , 1999 .

[8]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[9]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Nitesh V. Chawla,et al.  Link Prediction and Recommendation across Heterogeneous Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11]  Nitesh V. Chawla,et al.  Consequences of Variability in Classifier Performance Estimates , 2010, 2010 IEEE International Conference on Data Mining.

[12]  M. de Rijke,et al.  Discovering missing links in Wikipedia , 2005, LinkKDD '05.

[13]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[14]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[15]  Padhraic Smyth,et al.  Prediction and ranking algorithms for event-based network data , 2005, SKDD.

[16]  Nitesh V. Chawla,et al.  Link Prediction: Fair and Effective Evaluation , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[17]  J. Skolnick,et al.  Prediction of physical protein–protein interactions , 2005, Physical biology.

[18]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Hisashi Kashima,et al.  A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[21]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[22]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[23]  Zan Huang Link Prediction Based on Graph Topology: The Predictive Value of Generalized Clustering Coefficient , 2010 .

[24]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[25]  Ben Y. Zhao,et al.  Exploiting locality of interest in online social networks , 2010, CoNEXT.

[26]  Nitesh V. Chawla,et al.  Multi-relational Link Prediction in Heterogeneous Information Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[27]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[28]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[29]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[30]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[31]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[32]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[33]  Yuanyuan Tian,et al.  Event-based social networks: linking the online and offline social worlds , 2012, KDD.

[34]  A. Barab,et al.  Evolution of the social network of scienti $ c collaborations , 2002 .

[35]  N. Graham,et al.  Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation , 2002 .

[36]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Cecilia Mascolo,et al.  Exploiting place features in link prediction on location-based social networks , 2011, KDD.

[38]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Jiawei Han,et al.  A Unified Framework for Link Recommendation Using Random Walks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[40]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[41]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[42]  Francesco Bonchi,et al.  Cold start link prediction , 2010, KDD.

[43]  M. Acevedo,et al.  Social network models predict movement and connectivity in ecological landscapes , 2011, Proceedings of the National Academy of Sciences.

[44]  Cecilia Mascolo,et al.  Distance Matters: Geo-social Metrics for Online Social Networks , 2010, WOSN.

[45]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[46]  Muriel Medard,et al.  Proceedings of the 6th International COnference , 2010 .

[47]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[48]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[49]  Jie Tang,et al.  Who will follow you back?: reciprocal relationship prediction , 2011, CIKM '11.

[50]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[51]  Marián Boguñá,et al.  Popularity versus similarity in growing networks , 2011, Nature.

[52]  Elaine Shi,et al.  Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge , 2011, The 2011 International Joint Conference on Neural Networks.

[53]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[54]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[55]  Nitesh V. Chawla,et al.  Predicting Links in Multi-relational and Heterogeneous Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[56]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[57]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[58]  Pang-Ning Tan,et al.  A matrix alignment approach for link prediction , 2008, 2008 19th International Conference on Pattern Recognition.

[59]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[60]  Hsuan-Tien Lin,et al.  Learning From Data , 2012 .

[61]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Lise Getoor,et al.  Link mining: a new data mining challenge , 2003, SKDD.