DeepNC: Deep Generative Network Completion

Most network data are collected from partially observable networks with both missing nodes and missing edges, for example, due to limited resources and privacy settings specified by users on social media. Thus, it stands to reason that inferring the missing parts of the networks by performing network completion should precede downstream applications. However, despite this need, the recovery of missing nodes and edges in such incomplete networks is an insufficiently explored problem due to the modeling difficulty, which is much more challenging than link prediction that only infers missing edges. In this paper, we present DeepNC, a novel method for inferring the missing parts of a network based on a deep generative model of graphs. Specifically, our method first learns a likelihood over edges via an autoregressive generative model, and then identifies the graph that maximizes the learned likelihood conditioned on the observable graph topology. Moreover, we propose a computationally efficient DeepNC algorithm that consecutively finds individual nodes that maximize the probability in each node generation step, as well as an enhanced version using the expectation-maximization algorithm. The runtime complexities of both algorithms are shown to be almost linear in the number of nodes in the network. We empirically demonstrate the superiority of DeepNC over state-of-the-art network completion approaches.

[1]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[2]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[3]  Francesco Buccafurri,et al.  Discovering missing me edges across social networks , 2015, Inf. Sci..

[4]  Sarit Kraus,et al.  Predicting and Identifying Missing Node Information in Social Networks , 2013, ACM Trans. Knowl. Discov. Data.

[5]  Paul Erdös,et al.  On random graphs, I , 1959 .

[6]  Stephan Günnemann,et al.  NetGAN: Generating Graphs via Random Walks , 2018, ICML.

[7]  Matthew J. Salganik,et al.  How Many People Do You Know?: Efficiently Estimating Personal Network Size , 2010, Journal of the American Statistical Association.

[8]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[9]  Min-Soo Kim,et al.  EvoGraph: An Effective and Efficient Graph Upscaling Method for Preserving Graph Properties , 2018, KDD.

[10]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[11]  George C. Verghese,et al.  Graph similarity scoring and matching , 2008, Appl. Math. Lett..

[12]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Garry Robins,et al.  Bayesian analysis for partially observed network data, missing ties, attributes and actors , 2013, Soc. Networks.

[16]  G. Loewenstein,et al.  Privacy and human behavior in the age of information , 2015, Science.

[17]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[18]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[19]  Yao Lu,et al.  A fast projected fixed-point algorithm for large graph matching , 2012, Pattern Recognit..

[20]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[21]  Jingrui He,et al.  Misc-GAN: A Multi-scale Generative Model for Graphs , 2019, Front. Big Data.

[22]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[23]  Keith W. Ross,et al.  Facebook users have become much more private: A large-scale study , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications Workshops.

[24]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[25]  M. Newman Random Graphs as Models of Networks , 2002, cond-mat/0202208.

[26]  Sarit Kraus,et al.  SAMI: an algorithm for solving the missing node problem using structure and attribute information , 2013, Social Network Analysis and Mining.

[27]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[28]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[29]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[30]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Salih O. Duffuaa,et al.  A Linear Programming Approach for the Weighted Graph Matching Problem , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Xiaolong Zhang,et al.  Capturing missing edges in social networks using vertex similarity , 2011, K-CAP '11.

[33]  Gueorgi Kossinets Effects of missing data in social networks , 2006, Soc. Networks.

[34]  Andreas Spitz,et al.  Community Detection in Partially Observable Social Networks , 2017, ACM Trans. Knowl. Discov. Data.

[35]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..