Link prediction using supervised learning

Social network analysis has attracted much attention in recent years. Link prediction is a key research directions within this area. In this research, we study link prediction as a supervised learning task. Along the way, we identify a set of features that are key to the superior performance under the supervised learning setup. The identified features are very easy to compute, and at the same time surprisingly effective in solving the link prediction problem. We also explain the effectiveness of the features from their class density distribution. Then we compare different classes of supervised learning algorithms in terms of their prediction performance using various performance metrics, such as accuracy, precision-recall, F-values, squared error etc. with a 5-fold cross validation. Our results on two practical social network datasets shows that most of the well-known classification algorithms (decision tree, k-nn,multilayer perceptron, SVM, rbf network) can predict link with surpassing performances, but SVM defeats all of them with narrow margin in all different performance measures. Again, ranking of features with popular feature ranking algorithms shows that a small subset of features always plays a significant role in the link prediction job.

[1]  Kathryn B. Laskey,et al.  Learning Bayesian networks from incomplete data using evolutionary algorithms , 1999 .

[2]  Jiawei Han,et al.  Mining hidden community in heterogeneous social networks , 2005, LinkKDD '05.

[3]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[4]  Ken-ichi Matsumoto,et al.  Accelerating cross-project knowledge collaboration using collaborative filtering and social networks , 2005, MSR.

[5]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[6]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Malik Magdon-Ismail,et al.  Finding communities by clustering a graph into overlapping subgraphs , 2005, IADIS AC.

[8]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[9]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[11]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[12]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[13]  Lise Getoor,et al.  Deduplication and Group Detection using Links , 2004 .

[14]  Andrew Parker,et al.  Knowing What We Know: Supporting Knowledge Creation and Sharing in Social Networks , 2001 .

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  SrihariRohini,et al.  Feature selection for text categorization on imbalanced data , 2004 .

[17]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[18]  Xi Zhang,et al.  Modeling virus and anti-virus dynamics in topology-aware networks , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[19]  Edward Y. Chang,et al.  Aligning boundary in kernel space for learning imbalanced dataset , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[21]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[22]  Bradley Malin,et al.  Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[23]  Daniel Dajun Zeng,et al.  Why Does Collaborative Filtering Work? Recommendation Model Validation and Selection By Analyzing Bipartite Random Graphs , 2005 .

[24]  Anna Goldenberg,et al.  Bayes net graphs to understand co-authorship networks? , 2005, LinkKDD '05.

[25]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[26]  Leslie H. Abramson Psycho , 1977, What We Live For, What We Die For.