Theoretical Justification of Popular Link Prediction Heuristics

There are common intuitions about how social graphs are generated (for example, it is common to talk informally about nearby nodes sharing a link). There are also common heuristics for predicting whether two currently unlinked nodes in a graph should be linked (e.g. for suggesting friends in an online social network or movies to customers in a recommendation network). This paper provides what we believe to be the first formal connection between these intuitions and these heuristics. We look at a familiar class of graph generation models in which nodes are associated with locations in a latent metric space and connections are more likely between closer nodes. We also look at popular link-prediction heuristics such as number-of-common-neighbors and its weighted variants [Adamic and Adar, 2003] which have proved successful in predicting missing links, but are not direct derivatives of latent space graph models. We provide theoretical justifications for the success of some measures as compared to others, as reported in previous empirical studies. In particular we present a sequence of formal results that show bounds related to the role that a node's degree plays in its usefulness for link prediction, the relative importance of short paths versus long paths, and the effects of increasing non-determinism in the link generation process on link prediction quality. Our results can be generalized to any model as long as the latent space assumption holds.

[1]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[2]  R. Alba,et al.  Bonds of Pluralism: The Form and Substance of Urban Social Networks. , 1974 .

[3]  Katherine Faust Comparison of methods for positional analysis: Structural and general equivalences☆ , 1988 .

[4]  K. Fuast Comparison of methods for positional analysis: Structural and general equivalences , 1988 .

[5]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[6]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[7]  Prabhakar Raghavan,et al.  Social Networks: From the Web to the Enterprise , 2002, IEEE Internet Comput..

[8]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[9]  O. Haggstrom Reversible Markov chains , 2002 .

[10]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[11]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[12]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[13]  Hsinchun Chen,et al.  CrimeLink Explorer: Using Domain Knowledge to Facilitate Automated Crime Association Analysis , 2003, ISI.

[14]  Matthew Brand,et al.  A Random Walks Perspective on Maximizing Satisfaction and Profit , 2005, SDM.

[15]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[16]  Purnamrita Sarkar,et al.  A Tractable Approach to Finding Closest Truncated-commute-time Neighbors in Large Graphs , 2007, UAI.

[17]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.