Link Prediction and Topological Feature Importance in Social Networks

The problem of link prediction describes how to account for the development of connection structure in a graph. There are many applications of link prediction, such as predicting missing links and future links in online social networks. Much of the literature has focused on limited characteristics of the graph topology or on node attributes, rather than a broad range of measures. There is a rich spectrum of topological features associated with a graph, such as neighbourhood similarity scores, node centrality measures, community structure and path-based distance measures. In this paper we formulate a supervised learning approach to link prediction using a feature set of graph measures chosen to capture a wide range of topological structure. This approach has the advantage that it can be applied to any graph where the connection structure is known. Random forest learning models are used for their high accuracy and measures of feature importance. The feature importance scores reveal the strength of contribution of the topological predictors for link prediction in a variety of synthetically generated network datasets, as well as three real world citation networks. We investigate both undirected and directed cases. Our results show that this approach can deliver very high model precision and recall performance in certain graphs, and good performance generally. Our models also consistently outperform a simpler comparison model we developed to resemble earlier work. In addition, our analysis of variable importance for each dataset reveals meaningful information regarding deep network properties.

[1]  Panagiotis Symeonidis,et al.  From biological to social networks: Link prediction based on multi-way spectral clustering , 2013, Data Knowl. Eng..

[2]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[5]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[6]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[7]  Ichiro Sakata,et al.  Link prediction in citation networks , 2012, J. Assoc. Inf. Sci. Technol..

[8]  Christopher M. Danforth,et al.  An evolutionary algorithm approach to link prediction in dynamic social networks , 2013, J. Comput. Sci..

[9]  Niloy Ganguly,et al.  Discriminative Link Prediction Using Local Links, Node Features and Community Structure , 2013, 2013 IEEE 13th International Conference on Data Mining.

[10]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[11]  Naoki Shibata,et al.  Topological analysis of citation networks to discover the future core articles , 2007, J. Assoc. Inf. Sci. Technol..

[12]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[15]  Susanna Zaccarin,et al.  Modelling Network Data: An Introduction to Exponential Random Graph Models , 2010 .

[16]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[17]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[18]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[22]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[23]  Peng Wang,et al.  Link prediction in social networks: the state-of-the-art , 2014, Science China Information Sciences.