论文信息 - MolGrow: A Graph Normalizing Flow for Hierarchical Molecular Generation

MolGrow: A Graph Normalizing Flow for Hierarchical Molecular Generation

We propose a hierarchical normalizing flow model for generating molecular graphs. The model produces new molecular structures from a single-node graph by recursively splitting every node into two. All operations are invertible and can be used as plug-and-play modules. The hierarchical nature of the latent codes allows for precise changes in the resulting graph: perturbations in the top layer cause global structural changes, while perturbations in the consequent layers change the resulting molecule marginally. The proposed model outperforms existing generative graph models on the distribution learning task. We also show successful experiments on global and constrained optimization of chemical properties using latent codes of the model. Introduction Drug discovery is a challenging multidisciplinary task that combines domain knowledge in chemistry, biology, and computational science. Recent works demonstrated successful applications of machine learning to the drug development process, including synthesis planning (Segler, Preuss, and Waller 2018), protein folding (Senior et al. 2020), and hit discovery (Merk et al. 2018; Zhavoronkov et al. 2019). Advances in generative models enabled applications of machine learning to drug discovery, such as distribution learning and molecular property optimization. Distribution learning models train on a large dataset to produce novel compounds (Polykovskiy et al. 2020); property optimization models search the chemical space for molecules with desirable properties (Brown et al. 2019). Often researchers combine these tasks: they first train a distribution learning model and then use its latent codes to optimize molecular properties (Gómez-Bombarelli et al. 2018). For such models, proper latent codes are crucial for molecular space navigation. We propose a new graph generative model—MolGrow. Starting with a single node, it iteratively splits every node into two. Our model is invertible and maps molecular structures onto a fixed-size hierarchical manifold. Top levels of the manifold define global structure, while the bottom levels influence local features. Our contributions are three-fold: Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. • We propose a hierarchical normalizing flow model for generating molecular graphs. The model gradually increases graph size during sampling, starting with a single node; • We propose a fragment-oriented atom ordering that improves our model over commonly used breadth-first search ordering; • We apply our model to distribution learning and property optimization tasks. We report distribution learning metrics (Fréchet ChemNet distance and fragment distribution) for graph generative models besides providing standard uniqueness and validity measures. Background: Normalizing Flows Normalizing flows are generative models that transform a prior distribution p(z) into a target distribution p(x) by composing invertible functions fk: z = fK ◦ ... ◦ f2 ◦ f1(x), (1) x = f−1 1 ◦ ... ◦ f −1 K−1 ◦ f −1 K (z). (2) We call Equation 1 a forward path, and Equation 2 an inverse path. The prior distribution p(z) is often a standard multivariate normal distribution N (0, I). Such models are trained by maximizing training set log-likelihood using the change of variables formula: log p(x) = log p(z) + K ∑ i=1 log ∣∣∣∣det ( dhi dhi−1 )∣∣∣∣ , (3) where hi = fi(hi−1), h0 = x. To efficiently train the model and sample from it, inverse transformations and Jacobian determinants should be tractable and computationally efficient. In this work, we consider three types of layers: invertible linear layer, actnorm, and real-valued non-volume preserving transformation (RealNVP) (Dinh, Sohl-Dickstein, and Bengio 2017). We define these layers below for arbitrary d-dimensional vectors, and extend these layers for graphstructured data in the next section. We consider an invertible linear layer parameterization by Hoogeboom, Van Den Berg, and Welling (2019) that uses QR decomposition of a weight matrix: h = QR · z, where Q is an orthogonal matrix (Q = Q−1), and R is an upper The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Daniil Polykovskiy | Maksim Kuznetsov | Daniil Polykovskiy | Maksim Kuznetsov