Graphgen-redux: a Fast and Lightweight Recurrent Model for labeled Graph Generation

The problem of labeled graph generation is gaining attention in the Deep Learning community. The task is challenging due to the sparse and discrete nature of graph spaces. Several approaches have been proposed in the literature, most of which require to transform the graphs into sequences that encode their structure and labels and to learn the distribution of such sequences through an auto-regressive generative model. Among this family of approaches, we focus on the Graphgen model. The preprocessing phase of Graphgen transforms graphs into unique edge sequences called Depth-First Search (DFS) codes, such that two isomorphic graphs are assigned the same DFS code. Each element of a DFS code is associated with a graph edge: specifically, it is a quintuple comprising one node identifier for each of the two endpoints, their node labels, and the edge label. Graphgen learns to generate such sequences auto-regressively and models the probability of each component of the quintuple independently. While effective, the independence assumption made by the model is too loose to capture the complex label dependencies of real-world graphs precisely. By introducing a novel graph preprocessing approach, we are able to process the labeling information of both nodes and edges jointly. The corresponding model, which we term Graphgen-redux, improves upon the generative performances of Graphgen in a wide range of datasets of chemical and social graphs. In addition, it uses approximately 78% fewer parameters than the vanilla variant and requires 50% fewer epochs of training on average.

[1]  Davide Bacciu,et al.  Contextual Graph Markov Model: A Deep and Generative Approach to Graph Processing , 2018, ICML.

[2]  Daniel D. Johnson,et al.  Learning Graphical State Transitions , 2016, ICLR.

[3]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[4]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[5]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[6]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[8]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[9]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[10]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[11]  Antje Chang,et al.  New Developments , 2003 .

[12]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[13]  Regina Barzilay,et al.  Learning Multimodal Graph-to-Graph Translation for Molecular Optimization , 2018, ICLR.

[14]  Tatsuya Harada,et al.  GRAM: Scalable Generative Models for Graphs with Graph Attention Mechanism , 2019, ArXiv.

[15]  Zachary C. Lipton,et al.  Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[16]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[17]  A. Micheli,et al.  A Deep Generative Model for Fragment-Based Molecule Generation , 2020, AISTATS.

[18]  Davide Bacciu,et al.  Graph generation by sequential edge prediction , 2019, ESANN.

[19]  Alessio Micheli,et al.  Neural Network for Graphs: A Contextual Constructive Approach , 2009, IEEE Transactions on Neural Networks.

[20]  László Babai,et al.  Canonical labeling of graphs , 1983, STOC.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Sayan Ranu,et al.  GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation , 2020, WWW.

[25]  Xiaojie Guo,et al.  A Systematic Survey on Deep Generative Models for Graph Generation , 2020, ArXiv.

[26]  Yanli Wang,et al.  PubChem BioAssay: 2017 update , 2016, Nucleic Acids Res..

[27]  Matt J. Kusner,et al.  A Model to Search for Synthesizable Molecules , 2019, NeurIPS.

[28]  Stefano Ermon,et al.  Graphite: Iterative Generative Modeling of Graphs , 2018, ICML.

[29]  Davide Bacciu,et al.  A Fair Comparison of Graph Neural Networks for Graph Classification , 2020, ICLR.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Davide Bacciu,et al.  Edge-based sequential graph generation with recurrent neural networks , 2020, Neurocomputing.

[32]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[33]  Minyi Guo,et al.  GraphGAN: Graph Representation Learning with Generative Adversarial Nets , 2017, AAAI.

[34]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[35]  Olexandr Isayev,et al.  MolecularRNN: Generating realistic molecular graphs with optimized properties , 2019, ArXiv.