A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming

Analysis of chemical graphs is a major research topic in computational molecular biology due to its potential applications to drug design. One approach is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a framework has been proposed for inverse QSAR/QSPR using artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of a prediction phase and an inverse prediction phase. In the first phase, a feature vector $f(G)$ of a chemical graph $G$ is introduced and a prediction function $\psi$ on a chemical property $\pi$ is constructed with an ANN. In the second phase, given a target value $y^*$ of property $\pi$, a feature vector $x^*$ is inferred by solving an MILP formulated from the trained ANN so that $\psi(x^*)$ is close to $y^*$ and then a set of chemical structures $G^*$ such that $f(G^*)= x^*$ is enumerated by a graph search algorithm. The framework has been applied to the case of chemical compounds with cycle index up to 2. The computational results conducted on instances with $n$ non-hydrogen atoms show that a feature vector $x^*$ can be inferred for up to around $n=40$ whereas graphs $G^*$ can be enumerated for up to $n=15$. When applied to the case of chemical acyclic graphs, the maximum computable diameter of $G^*$ was around up to around 8. We introduce a new characterization of graph structure, "branch-height," based on which an MILP formulation and a graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using properties such as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs $G^*$ with $n=50$ and diameter 30.

[1]  Hiroshi Nagamochi,et al.  Enumerating Chemical Graphs with Mono-block 2-Augmented Tree Structure from Given Upper and Lower Bounds on Path Frequencies , 2020, ArXiv.

[2]  Tatsuya Akutsu,et al.  Inferring a Graph from Path Frequency , 2005, CPM.

[3]  Hiroshi Nagamochi,et al.  A Detachment Algorithm for Inferring a Graph from Path Frequency , 2009, Algorithmica.

[4]  Kenta Hongo,et al.  Bayesian molecular design with a chemical language model , 2017, Journal of Computer-Aided Molecular Design.

[5]  Hiroshi Nagamochi,et al.  Enumerating Treelike Chemical Graphs with Given Path Frequency , 2008, J. Chem. Inf. Model..

[6]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[7]  T. Akutsu,et al.  A Method for the Inverse QSAR/QSPR Based on Artificial Neural Networks and Mixed Integer Linear Programming , 2020, ICBBB.

[8]  Hiroshi Nagamochi,et al.  Efficient enumeration of monocyclic chemical graphs with given path frequencies , 2014, Journal of Cheminformatics.

[9]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[10]  Hiroshi Nagamochi,et al.  A New Integer Linear Programming Formulation to the Inverse QSAR/QSPR for Acyclic Chemical Compounds Using Skeleton Trees , 2020, IEA/AIE.

[11]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[12]  Hiromasa Kaneko,et al.  Inverse QSPR/QSAR Analysis for Chemical Structure Generation (from y to x) , 2016, J. Chem. Inf. Model..

[13]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[14]  Koji Tsuda,et al.  ChemTS: an efficient python library for de novo molecular generation , 2017, Science and technology of advanced materials.

[15]  Hiroshi Nagamochi,et al.  Enumerating Substituted Benzene Isomers of Tree-Like Chemical Graphs , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[17]  Benjamin A. Shoemaker,et al.  PubChem in 2021: new data content and improved web interfaces , 2020, Nucleic Acids Res..

[18]  Hiroshi Nagamochi,et al.  A novel method for inference of chemical compounds with prescribed topological substructures based on integer programming , 2021, IEEE/ACM transactions on computational biology and bioinformatics.

[19]  Hiroshi Nagamochi,et al.  A Novel Method for Inference of Chemical Compounds of Cycle Index Two with Desired Properties Based on Artificial Neural Networks and Integer Programming , 2020, Algorithms.

[20]  Hiroshi Nagamochi,et al.  A Novel Method for the Inverse QSAR/QSPR to Monocyclic Chemical Compounds Based on Artificial Neural Networks and Integer Programming , 2021, Advances in Computer Vision and Computational Biology.

[21]  Igor I. Baskin,et al.  Inverse problem in QSAR/QSPR studies for the case of topological indexes characterizing molecular shape (Kier indices) , 1993, J. Chem. Inf. Comput. Sci..

[22]  J. Reymond The chemical space project. , 2015, Accounts of chemical research.

[23]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[24]  T. Akutsu,et al.  A Mixed Integer Linear Programming Formulation to Artificial Neural Networks , 2019, Proceedings of the 2019 2nd International Conference on Information Science and Systems.

[25]  Hiroshi Nagamochi,et al.  Enumerating Chemical Graphs with Two Disjoint Cycles Satisfying Given Path Frequency Specifications , 2020, ArXiv.

[26]  David N. Beratan,et al.  Strategy To Discover Diverse Optimal Molecules in the Small Molecule Universe , 2015, J. Chem. Inf. Model..