Chapter 1 LINK PREDICTION IN SOCIAL NETWORKS Link Prediction

Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce. There exist a variety of techniques for link prediction, ranging from feature-based classification and kernelbased method to matrix factorization and probabilistic graphical models. These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability. In this article, we survey some representative link prediction methods by categorizing them by the type of the models. We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model. Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models. And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices. We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness. We conclude the survey with a discussion on recent developments and future research direction.

[1]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[3]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[4]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[5]  Xerox,et al.  The Small World , 1999 .

[6]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[7]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[8]  W. Imrich,et al.  Product Graphs: Structure and Recognition , 2000 .

[9]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[10]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[13]  A. Barab,et al.  Evolution of the social network of scienti $ c collaborations , 2002 .

[14]  Jun Hong,et al.  Using Markov models for web site link prediction , 2002, HYPERTEXT '02.

[15]  Robert B. Allen,et al.  Proceedings of the thirteenth ACM conference on Hypertext and hypermedia , 2002 .

[16]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[17]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[18]  Scott Shenker,et al.  On a network creation game , 2003, PODC '03.

[19]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[20]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[21]  Heikki Mannila,et al.  Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data , 2003, IEEE Trans. Knowl. Data Eng..

[22]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[23]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[24]  Lyle H. Ungar,et al.  Structural Logistic Regression for Link Analysis , 2003 .

[25]  Christopher D. Manning,et al.  Using Feature Conjunctions Across Examples for Learning Pairwise Classifiers , 2004, ECML.

[26]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[27]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[28]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[29]  David Heckerman,et al.  Probabilistic Models for Relational Data , 2004 .

[30]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[31]  Edoardo M. Airoldi,et al.  A Network Analysis Model for Disambiguation of Names in Lists , 2005, Comput. Math. Organ. Theory.

[32]  Hans-Peter Kriegel,et al.  Dirichlet enhanced relational learning , 2005, ICML.

[33]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[34]  Padhraic Smyth,et al.  Prediction and ranking algorithms for event-based network data , 2005, SKDD.

[35]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[36]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[37]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[38]  Wei Chu,et al.  Stochastic Relational Models for Discriminative Link Prediction , 2006, NIPS.

[39]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[40]  E. Xing,et al.  Mixed Membership Stochastic Block Models for Relational Data with Application to Protein-Protein Interactions , 2006 .

[41]  Hisashi Kashima,et al.  A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction , 2006, Sixth International Conference on Data Mining (ICDM'06).

[42]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[43]  Lise Getoor,et al.  Combining Collective Classification and Link Prediction , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[44]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[45]  Yan Liu,et al.  Predicting who rated what in large-scale datasets , 2007, SKDD.

[46]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[47]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.

[48]  Ben Taskar,et al.  Relational Markov Networks , 2007 .

[49]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[50]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[51]  Prasad Tadepalli,et al.  Chance-Constrained Programs for Link Prediction , 2009 .

[52]  Yoshihiro Yamanishi,et al.  On Pairwise Kernels: An Efficient Alternative and Generalization Analysis , 2009, PAKDD.

[53]  Srikanta J. Bedathur,et al.  Towards time-aware link prediction in evolving social networks , 2009, SNA-KDD '09.

[54]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[55]  Jérôme Kunegis,et al.  Learning spectral graph transformations for link prediction , 2009, ICML '09.

[56]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[57]  Jianjun Wang,et al.  Margin calibration in SVM class-imbalanced learning , 2009, Neurocomputing.

[58]  Hsinchun Chen,et al.  Recommendation as link prediction: a graph kernel-based machine learning approach , 2009, JCDL '09.

[59]  Valerio Freschi,et al.  A Graph-Based Semi-supervised Algorithm for Protein Function Prediction from Interaction Maps , 2009, LION.

[60]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[61]  Wenbo Zhao,et al.  PageRank and Random Walks on Graphs , 2010 .