Predicting and Identifying Missing Node Information in Social Networks

In recent years, social networks have surged in popularity. One key aspect of social network research is identifying important missing information that is not explicitly represented in the network, or is not visible to all. To date, this line of research typically focused on finding the connections that are missing between nodes, a challenge typically termed as the link prediction problem. This article introduces the missing node identification problem, where missing members in the social network structure must be identified. In this problem, indications of missing nodes are assumed to exist. Given these indications and a partial network, we must assess which indications originate from the same missing node and determine the full network structure. Toward solving this problem, we present the missing node identification by spectral clustering algorithm (MISC), an approach based on a spectral clustering algorithm, combined with nodes’ pairwise affinity measures that were adopted from link prediction research. We evaluate the performance of our approach in different problem settings and scenarios, using real-life data from Facebook. The results show that our approach has beneficial results and can be effective in solving the missing node identification problem. In addition, this article also presents R-MISC, which uses a sparse matrix representation, efficient algorithms for calculating the nodes’ pairwise affinity, and a proprietary dimension reduction technique to enable scaling the MISC algorithm to large networks of more than 100,000 nodes. Last, we consider problem settings where some of the indications are unknown. Two algorithms are suggested for this problem: speculative MISC, based on MISC, and missing link completion, based on classical link prediction literature. We show that speculative MISC outperforms missing link completion.

[1]  Jure Leskovec,et al.  Correcting for missing data in information cascades , 2011, WSDM '11.

[2]  Jacob Goldberger,et al.  Unifying Unknown Nodes in the Internet Graph Using Semisupervised Spectral Clustering , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[3]  Gueorgi Kossinets Effects of missing data in social networks , 2006, Soc. Networks.

[4]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[5]  Hamid R. Rabiee,et al.  DNE: A Method for Extracting Cascaded Diffusion Networks from Social Networks , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[6]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[7]  Sarit Kraus,et al.  Identifying Missing Node Information in Social Networks , 2011, AAAI.

[8]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[9]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[10]  Horst Bunke,et al.  Similarity Measures for Structured Representations , 1993, EWCBR.

[11]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[12]  Gemma C. Garriga,et al.  Learning to Recommend Links using Graph Structure and Node Content , 2011, NIPS 2011.

[13]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[14]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[15]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[16]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[17]  Joris Kinable,et al.  Improved call graph comparison using simulated annealing , 2011, SAC.

[18]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[19]  Philip S. Yu,et al.  Community detection in incomplete information networks , 2012, WWW.

[20]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[21]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[22]  Ling Huang,et al.  Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN) , 2011, ArXiv.

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Francesco Bonchi,et al.  Cold start link prediction , 2010, KDD.

[25]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[26]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[27]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..