sonLP: Social network link prediction by principal component regression

Social networks are driven by social interaction and therefore dynamic. When modeled as a graph, nodes and links are continually added and deleted, and there is considerable interest in social network analysis on predicting link formation. Current work has not adequately addressed three issues: (1) Most link predictors start with using features from the link topology as input. How do features in other dimensions of the social network data affect link formation? (2) The dynamic nature of social networks implies the features driving link formation are constantly changing. How can a predictor automatically select the features that are important for link formation? (3) Node pairs that are not linked can outnumber links by orders of magnitude, but previous work do not address this imbalance. How can we design a predictor that is robust with respect to link imbalance? This paper presents sonLP, a social network link predictor. It uses principal component analysis to identify features that are important to link prediction, its tradeoff between true and false positives is near optimal for a wide range of link imbalance, and it has optimal time complexity. Experiments with coauthorship prediction in the ACM researcher community also show the importance of using features outside the links' dimension.

[1]  Alexander Weber,et al.  Managing the Quality of Person Names in DBLP , 2006, ECDL.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[4]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[5]  Isaac Olusegun Osunmakinde,et al.  Temporality in Link Prediction: Understanding Social Complexity , 2009 .

[6]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[7]  Janardhan Rao Doppa,et al.  Chance-Constrained Programs for Link Prediction , 2009 .

[8]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Srikanta J. Bedathur,et al.  Towards time-aware link prediction in evolving social networks , 2009, SNA-KDD '09.

[10]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[11]  Wei Tang,et al.  Supervised Link Prediction Using Multiple Sources , 2010, 2010 IEEE International Conference on Data Mining.

[12]  Florian Reitz,et al.  An Analysis of the Evolving Coverage of Computer Science Sub-fields in the DBLP Digital Library , 2010, ECDL.

[13]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.

[14]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[15]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[16]  Rami Puzis,et al.  Link Prediction in Social Networks Using Computationally Efficient Topological Features , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[17]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[18]  Giulio Rossetti,et al.  Scalable Link Prediction on Multidimensional Networks , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[19]  C. Lee Giles,et al.  Collaboration over time: characterizing and modeling network evolution , 2008, WSDM '08.

[20]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[21]  Kuldip K. Paliwal,et al.  Fast principal component analysis using fixed-point algorithm , 2007, Pattern Recognit. Lett..

[22]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[23]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[24]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[25]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.