Link Prediction in Social Networks Using Computationally Efficient Topological Features

Online social networking sites have become increasingly popular over the last few years. As a result, new interdisciplinary research directions have emerged in which social network analysis methods are applied to networks containing hundreds millions of users. Unfortunately, links between individuals may be missing due to imperfect acquirement processes or because they are not yet reflected in the online network (i.e., friends in real world did not form a virtual connection.) Existing link prediction techniques lack the scalability required for full application on a continuously growing social network which may be adding everyday users with thousands of connections. The primary bottleneck in link prediction techniques is extracting structural features required for classifying links. In this paper we propose a set of simple, easy-to-compute structural features that can be analyzed to identify missing links. We show that a machine learning classifier trained using the proposed simple structural features can successfully identify missing links even when applied to a hard problem of classifying links between individuals who have at least one common friend. A new friends measure that we developed is shown to be a good predictor for missing links and an evaluation experiment was performed on five large social networks datasets: Face book, Flickr, You Tube, Academia and The Marker. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social network.

[1]  Bo Yang,et al.  Graph-based features for supervised link prediction , 2011, The 2011 International Joint Conference on Neural Networks.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[4]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[5]  Michael J. Muller,et al.  Make new friends, but keep the old: recommending people on social networking sites , 2009, CHI.

[6]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[7]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[8]  J. Demšar Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[10]  M. Steinbach,et al.  Introduction to Data Mining , 2005, Principles of Data Mining.

[11]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[12]  A. Gibbons Algorithmic Graph Theory , 1985 .

[13]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[14]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .

[15]  Ricardo B. C. Prudêncio,et al.  Supervised Learning for Link Prediction in Weighted Networks , 2010 .

[16]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[17]  Prasad Tadepalli,et al.  Chance-Constrained Programs for Link Prediction , 2009 .

[18]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[19]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[20]  E. Xing,et al.  Mixed Membership Stochastic Block Models for Relational Data with Application to Protein-Protein Interactions , 2006 .

[21]  A. Barabasi,et al.  Emergence of Scaling in Random Networks , 1999 .