A Survey of Link Prediction in Social Networks

Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce. There exist a variety of techniques for link prediction, ranging from feature-based classification and kernel-based method to matrix factorization and probabilistic graphical models. These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability. In this article, we survey some representative link prediction methods by categorizing them by the type of the models. We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model. Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models. And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices. We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness. We conclude the survey with a discussion on recent developments and future research direction.

[1]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Panagiotis G. Ipeirotis,et al.  Duplicate Record Detection: A Survey , 2007 .

[3]  Yan Liu,et al.  Predicting who rated what in large-scale datasets , 2007, SKDD.

[4]  Padhraic Smyth,et al.  Prediction and ranking algorithms for event-based network data , 2005, SKDD.

[5]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[6]  Hans-Peter Kriegel,et al.  Dirichlet enhanced relational learning , 2005, ICML.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[9]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Hsinchun Chen,et al.  Recommendation as link prediction: a graph kernel-based machine learning approach , 2009, JCDL '09.

[11]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[12]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[13]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[14]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[15]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[16]  Srikanta J. Bedathur,et al.  Towards time-aware link prediction in evolving social networks , 2009, SNA-KDD '09.

[17]  Jérôme Kunegis,et al.  Learning spectral graph transformations for link prediction , 2009, ICML '09.

[18]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Heikki Mannila,et al.  Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data , 2003, IEEE Trans. Knowl. Data Eng..

[21]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007 .

[22]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[23]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[24]  Scott Shenker,et al.  On a network creation game , 2003, PODC '03.

[25]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[26]  Valerio Freschi,et al.  A Graph-Based Semi-supervised Algorithm for Protein Function Prediction from Interaction Maps , 2009, LION.

[27]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[28]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[29]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[30]  Hisashi Kashima,et al.  A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[32]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[33]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[34]  Lyle H. Ungar,et al.  Structural Logistic Regression for Link Analysis , 2003 .

[35]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[36]  Edoardo M. Airoldi,et al.  A Network Analysis Model for Disambiguation of Names in Lists , 2005, Comput. Math. Organ. Theory.

[37]  Christopher D. Manning,et al.  Using Feature Conjunctions across Examples for Learning Pairwise Classifiers , 2005 .

[38]  David Heckerman,et al.  Probabilistic Models for Relational Data , 2004 .

[39]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[40]  D. Heckerman,et al.  Dependency networks for inference , 2000 .

[41]  Xerox,et al.  The Small World , 1999 .

[42]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[43]  Thomas Hofmann,et al.  Stochastic Relational Models for Discriminative Link Prediction , 2007 .

[44]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[45]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[46]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[47]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[48]  E. Xing,et al.  Mixed Membership Stochastic Block Models for Relational Data with Application to Protein-Protein Interactions , 2006 .

[49]  Wenbo Zhao,et al.  PageRank and Random Walks on Graphs , 2010 .

[50]  Janardhan Rao Doppa,et al.  Chance-Constrained Programs for Link Prediction , 2009 .

[51]  Jianjun Wang,et al.  Margin calibration in SVM class-imbalanced learning , 2009, Neurocomputing.

[52]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[53]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[54]  Jun Hong,et al.  Using Markov models for web site link prediction , 2002, HYPERTEXT '02.

[55]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[56]  Yoshihiro Yamanishi,et al.  On Pairwise Kernels: An Efficient Alternative and Generalization Analysis , 2009, PAKDD.

[57]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[58]  Martin Suter,et al.  Small World , 2002 .

[59]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[60]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[61]  Robert B. Allen,et al.  Proceedings of the thirteenth ACM conference on Hypertext and hypermedia , 2002 .

[62]  Lise Getoor,et al.  Combining Collective Classification and Link Prediction , 2007 .

[63]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[64]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[65]  W. Imrich,et al.  Product Graphs: Structure and Recognition , 2000 .

[66]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[67]  Sisay Fissaha Discovering Missing Links in Wikipedia , 2005 .

[68]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[69]  Ben Taskar,et al.  Relational Markov Networks , 2007 .

[70]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.