Inverse‐QSPR for de novo Design: A Review

The use of computer tools to solve chemistry‐related problems has given rise to a large and increasing number of publications these last decades. This new field of science is now well recognized and labelled Chemoinformatics. Among all chemoinformatics techniques, the use of statistical based approaches for property predictions has been the subject of numerous research reflecting both new developments and many cases of applications. The so obtained predictive models relating a property to molecular features – descriptors – are gathered under the acronym QSPR, for Quantitative Structure Property Relationships. Apart from the obvious use of such models to predict property values for new compounds, their use to virtually synthesize new molecules – de novo design – is currently a high‐interest subject. Inverse‐QSPR (i‐QSPR) methods have hence been developed to accelerate the discovery of new materials that meet a set of specifications. In the proposed manuscript, we review existing i‐QSPR methodologies published in the open literature in a way to highlight developments, applications, improvements and limitations of each.

[1]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[2]  Igor I. Baskin,et al.  Machine Learning Methods for Property Prediction in Chemoinformatics: Quo Vadis? , 2012, J. Chem. Inf. Model..

[3]  Kyle V. Camarda,et al.  Optimization in polymer design using connectivity indices , 1999 .

[4]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[5]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[6]  Sandro Macchietto,et al.  Computer aided molecular design: a novel method for optimal solvent selection , 1993 .

[7]  Kimito Funatsu,et al.  Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space , 2017, Molecular informatics.

[8]  Richard C. Wilson,et al.  Generative Models for Chemical Structures , 2010, J. Chem. Inf. Model..

[9]  Scott Boyer,et al.  Localized Heuristic Inverse Quantitative Structure Activity Relationship with Bulk Descriptors Using Numerical Gradients , 2013, J. Chem. Inf. Model..

[10]  Michael A. Ivanov Diophantine equations , 2004 .

[11]  Jiri Pospichal,et al.  Simulated Annealing Construction of Molecular Graphs with Required Properties , 1996, J. Chem. Inf. Comput. Sci..

[12]  Igor I. Baskin,et al.  Inverse problem in QSAR/QSPR studies for the case of topological indexes characterizing molecular shape (Kier indices) , 1993, J. Chem. Inf. Comput. Sci..

[13]  Christian Kirches,et al.  Mixed-integer nonlinear optimization*† , 2013, Acta Numerica.

[14]  Costas D. Maranas,et al.  Optimization in product design with properties correlated with topological indices , 1998 .

[15]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[16]  George C. Derringer,et al.  A computer-based methodology for matching polymer structures with required properties , 1985 .

[17]  Kyle V. Camarda,et al.  Design of novel pharmaceutical products via combinatorial optimization , 2000 .

[18]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[19]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[20]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[21]  Markus Hartenfeller,et al.  DOGS: Reaction-Driven de novo Design of Bioactive Compounds , 2012, PLoS Comput. Biol..

[22]  Hiromasa Kaneko,et al.  Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping , 2018, Molecular informatics.

[23]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[24]  Igor V Tetko,et al.  A renaissance of neural networks in drug discovery , 2016, Expert opinion on drug discovery.

[25]  Michael Egmont-Petersen,et al.  Image processing with neural networks - a review , 2002, Pattern Recognit..

[26]  Svetoslav H. Slavov,et al.  Quantitative correlation of physical and chemical properties with chemical structure: utility for prediction. , 2010, Chemical reviews.

[27]  Seokho Kang,et al.  Deep-learning-based inverse design model for intelligent discovery of organic molecules , 2018, npj Computational Materials.

[28]  Supratik Kar,et al.  On a simple approach for determining applicability domain of QSAR models , 2015 .

[29]  Ramaswamy Nilakantan,et al.  A method for automatic generation of novel chemical structures and its potential applications to drug discovery , 1991, J. Chem. Inf. Comput. Sci..

[30]  Johann Gasteiger,et al.  A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules , 2004, J. Chem. Inf. Model..

[31]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences , 2003, J. Chem. Inf. Comput. Sci..

[32]  Héléna A. Gaspar,et al.  Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure‐Activity Modeling and Dataset Comparison , 2012, Molecular informatics.

[33]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[34]  R. Gani,et al.  Computer aided product design: problem formulations, methodology and applications , 1996 .

[35]  Daniel J Warner,et al.  Prospective Prediction of Antitarget Activity by Matched Molecular Pairs Analysis , 2012, Molecular informatics.

[36]  J. L. Franklin Prediction of Heat and Free Energies of Organic Compounds , 1949 .

[37]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[38]  Brendan D. McKay,et al.  Isomorph-Free Exhaustive Generation , 1998, J. Algorithms.

[39]  Rafiqul Gani,et al.  A strategy for the design and selection of solvents for separation processes. , 1986 .

[40]  Shahar Harel,et al.  Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks , 2018, KDD.

[41]  Todd J. A. Ewing,et al.  DREAM++: Flexible docking program for virtual combinatorial libraries , 1999, J. Comput. Aided Mol. Des..

[42]  Chris L. Waller,et al.  Rational Combinatorial Library Design. 3. Simulated Annealing Guided Evaluation (SAGE) of Molecular Diversity: A Novel Computational Tool for Universal Library Design and Database Mining , 1999, J. Chem. Inf. Comput. Sci..

[43]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[44]  Sung Jin Cho,et al.  Rational Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide Libraries Using Chemical Similarity Probe and the Inverse QSAR Approaches , 1998, J. Chem. Inf. Comput. Sci..

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Forbes J. Burkowski,et al.  Predicting multiple binding modes using a kernel method based on a Vector Space Model Molecular Descriptor , 2009, Int. J. Comput. Biol. Drug Des..

[47]  A. Varnek,et al.  Prediction of Optimal Salinities for Surfactant Formulations Using a Quantitative Structure-Property Relationships Approach , 2015 .

[48]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[49]  Jozef Bicerano,et al.  Prediction of the Properties of Polymers from Their Structures , 1996 .

[50]  Yaqing Feng,et al.  Application of an inverse-design method to optimizing porphyrins in dye-sensitized solar cells. , 2019, Physical chemistry chemical physics : PCCP.

[51]  Gerta Rücker,et al.  On Topological Indices, Boiling Points, and Cycloalkanes , 1999, J. Chem. Inf. Comput. Sci..

[52]  Venkat Venkatasubramanian,et al.  Computer-aided molecular design using genetic algorithms , 1994 .

[53]  M. Shahlaei Descriptor selection methods in quantitative structure-activity relationship studies: a review study. , 2013, Chemical reviews.

[54]  Jean-Loup Faulon,et al.  Stochastic Generator of Chemical Structure. 1. Application to the Structure Elucidation of Large Molecules , 1994, Journal of chemical information and computer sciences.

[55]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[56]  B. Alsberg,et al.  A knowledge‐based approach for screening chemical structures within de novo molecular evolution , 2010 .

[57]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[58]  David J. Livingstone,et al.  The Use of Artificial Neural Networks in QSAR , 1992 .

[59]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[60]  Lemont B. Kier,et al.  Design of molecules from quantitative structure-activity relationship models. 3. Role of higher order path counts: Path 3 , 1993, J. Chem. Inf. Comput. Sci..

[61]  Nikolai S. Zefirov,et al.  General methodology and computer program for the exhaustive restoring of chemical structures by molecular connectivity indexes. Solution of the inverse problem in QSAR/QSPR , 1990 .

[62]  G Schneider,et al.  The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. , 1994, Biophysical journal.

[63]  Jin Woo Kim,et al.  Molecular generative model based on conditional variational autoencoder for de novo molecular design , 2018, Journal of Cheminformatics.

[64]  Hiromasa Kaneko,et al.  Applicability Domains and Consistent Structure Generation , 2017, Molecular informatics.

[65]  Miroslav Kratochvíl,et al.  Sachem: a chemical cartridge for high-performance substructure search , 2018, Journal of Cheminformatics.

[66]  Derick C. Weis,et al.  The Signature Molecular Descriptor. 5. The Design of Hydrofluoroether Foam Blowing Agents Using Inverse-QSAR , 2005 .

[67]  Dragos Horvath,et al.  De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping , 2019, J. Chem. Inf. Model..

[68]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[69]  Alán Aspuru-Guzik,et al.  Reinforced Adversarial Neural Computer for de Novo Molecular Design , 2018, J. Chem. Inf. Model..

[70]  B. Baltagi,et al.  Contributions to Economic Analysis , 2011 .

[71]  P. Rotureau,et al.  A General Guidebook for the Theoretical Prediction of Physicochemical Properties of Chemicals for Regulatory Purposes. , 2015, Chemical reviews.

[72]  H. Hosoya Topological Index. A Newly Proposed Quantity Characterizing the Topological Nature of Structural Isomers of Saturated Hydrocarbons , 1971 .

[73]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[74]  Structural design inverse problems for topological indices in QSAR/QSPR studies , 2008 .

[75]  Jean-Loup Faulon,et al.  The signature molecular descriptor. 3. Inverse-quantitative structure-activity relationship of ICAM-1 inhibitory peptides. , 2003, Journal of molecular graphics & modelling.

[76]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[77]  Robert P. Sheridan,et al.  Using a Genetic Algorithm To Suggest Combinatorial Libraries , 1995, J. Chem. Inf. Comput. Sci..

[78]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[79]  Lemont B. Kier,et al.  Design of molecules from quantitative structure-activity relationship models. 1. Information transfer between path and vertex degree counts , 1993, J. Chem. Inf. Comput. Sci..

[80]  Jason Brownlee,et al.  Complex adaptive systems , 2007 .

[81]  David E. Clark,et al.  MOLMAKER: De Novo Generation of 3D Databases for Use in Drug Design , 1996, J. Chem. Inf. Comput. Sci..

[82]  S. F. Naser,et al.  A system for the design of an optimum liquid-liquid extractant molecule , 1991 .

[83]  Alexandre Varnek,et al.  Estimation of the size of drug-like chemical space based on GDB-17 data , 2013, Journal of Computer-Aided Molecular Design.

[84]  Hiromasa Kaneko,et al.  On Generative Topographic Mapping and Graph Theory combined approach for unsupervised non-linear data visualization and fault identification , 2017, Comput. Chem. Eng..

[85]  Jean-Louis Reymond,et al.  Virtual exploration of the small-molecule chemical universe below 160 Daltons. , 2005, Angewandte Chemie.

[86]  N. Chemmangattuvalappil,et al.  A Novel Methodology for Property-Based Molecular Design Using Multiple Topological Indices , 2013 .

[87]  Dejan Plavšić,et al.  The distance matrix in chemistry , 1992 .

[88]  Hiromasa Kaneko,et al.  Chemical-Space-Based de Novo Design Method To Generate Drug-Like Molecules , 2016, J. Chem. Inf. Model..

[89]  Mario R. Eden,et al.  Evolutionary algorithm for de novo molecular design with multi-dimensional constraints , 2015, Comput. Chem. Eng..

[90]  Evgeny Putin,et al.  Adversarial Threshold Neural Computer for Molecular de Novo Design. , 2018, Molecular pharmaceutics.

[91]  Hiromasa Kaneko,et al.  Development of a New De Novo Design Algorithm for Exploring Chemical Space , 2014, Molecular informatics.

[92]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[93]  Shawn Martin,et al.  Lattice Enumeration for Inverse Molecular Design Using the Signature Descriptor , 2012, J. Chem. Inf. Model..

[94]  John B. O. Mitchell,et al.  Is experimental data quality the limiting factor in predicting the aqueous solubility of druglike molecules? , 2014, Molecular pharmaceutics.

[95]  K. Funatsu,et al.  Strategy of Structure Generation within Applicability Domains with One-Class Support Vector Machine , 2015 .

[96]  Sung Jin Cho,et al.  Rational Combinatorial Library Design. 1. Focus-2D: A New Approach to the Design of Targeted Combinatorial Chemical Libraries , 1998, J. Chem. Inf. Comput. Sci..

[97]  Esteban A. Brignole,et al.  Computer‐aided molecular design of solvents for separation processes , 1994 .

[98]  Yibo Li,et al.  Multi-objective de novo drug design with conditional graph generative model , 2018, Journal of Cheminformatics.

[99]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[100]  Luke E. K. Achenie,et al.  Designing environmentally safe refrigerants using mathematical programming , 1996 .

[101]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles , 1999, J. Chem. Inf. Comput. Sci..

[102]  Ernesto Estrada,et al.  Chemical Graph Theory , 2013 .

[103]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[104]  Peixun Liu,et al.  Current Mathematical Methods Used in QSAR/QSPR Studies , 2009, International journal of molecular sciences.

[105]  Milan Randic,et al.  On Interpretation of Well-Known Topological Indices , 2001, J. Chem. Inf. Comput. Sci..

[106]  Kenta Hongo,et al.  Bayesian molecular design with a chemical language model , 2017, Journal of Computer-Aided Molecular Design.

[107]  S. Maitra,et al.  Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression , 2008 .

[108]  Daniel J. Warner,et al.  Matched molecular pairs as a medicinal chemistry tool. , 2011, Journal of medicinal chemistry.

[109]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[110]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[111]  Lemont B. Kier,et al.  Design of molecules from quantitative structure-activity relationship models. 2. Derivation and proof of information transfer relating equations , 1993, J. Chem. Inf. Comput. Sci..

[112]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[113]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[114]  Koji Tsuda,et al.  ChemTS: an efficient python library for de novo molecular generation , 2017, Science and technology of advanced materials.

[115]  Igor I. Baskin,et al.  Stargate GTM: Bridging Descriptor and Activity Spaces , 2015, J. Chem. Inf. Model..

[116]  Gilles Klopman,et al.  Vertex indexes of molecular graphs in structure-activity relationships: a study of the convulsant-anticonvulsant activity of barbiturates and the carcinogenicity of unsubstituted polycyclic aromatic hydrocarbons , 1990, J. Chem. Inf. Comput. Sci..

[117]  Derick C. Weis,et al.  Potential Glucocorticoid Receptor Ligands with Pulmonary Selectivity Using I‐QSAR with the Signature Molecular Descriptor , 2008, Chemical biology & drug design.

[118]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[119]  Mario R. Eden,et al.  Reverse problem formulation approach to molecular design using property operators based on signature descriptors , 2010, Comput. Chem. Eng..

[120]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[121]  Aage Fredenslund,et al.  Group‐contribution estimation of activity coefficients in nonideal liquid mixtures , 1975 .

[122]  Wolf-Dietrich Ihlenfeldt A next-generation chemistry database cartridge , 2013, Journal of Cheminformatics.

[123]  Donald P. Visco,et al.  Computer-aided molecular design using the Signature molecular descriptor: Application to solvent selection , 2010, Comput. Chem. Eng..

[124]  Benoit Creton,et al.  Chemoinformatics at IFP Energies Nouvelles: Applications in the Fields of Energy, Transport, and Environment , 2017, Molecular informatics.

[125]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[126]  Luke E. K. Achenie,et al.  Novel Mathematical Programming Model for Computer Aided Molecular Design , 1996 .

[127]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[128]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[129]  M. Randic Characterization of molecular branching , 1975 .

[130]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[131]  Hiromasa Kaneko,et al.  Inverse QSPR/QSAR Analysis for Chemical Structure Generation (from y to x) , 2016, J. Chem. Inf. Model..

[132]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[133]  Vasan Arunachalam,et al.  Optimization Using Differential Evolution , 2008 .

[134]  David C. Miller,et al.  Computer-aided molecular design using Tabu search , 2005, Comput. Chem. Eng..

[135]  Rafiqul Gani,et al.  MOLECULAR DESIGN OF SOLVENTS FOR LIQUID EXTRACTION BASED ON UNIFAC , 1983 .

[136]  Lemont B. Kier,et al.  A Shape Index from Molecular Graphs , 1985 .

[137]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[138]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[139]  Hiromasa Kaneko,et al.  Ring‐System‐Based Exhaustive Structure Generation for Inverse‐QSPR/QSAR , 2014, Molecular informatics.

[140]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[141]  Shahar Harel,et al.  Prototype-Based Compound Discovery Using Deep Generative Models. , 2018, Molecular pharmaceutics.

[142]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[143]  Vladimir A. Palyulin,et al.  Inverse Structure–Property Relationship Problem for the Case of a Correlation Equation Containing the Hosoya Index , 2001 .

[144]  J P Doucet,et al.  Application of topological descriptors in QSAR and drug design: history and new trends. , 2002, Current drug targets. Infectious disorders.

[145]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[146]  Luigi Di Pace,et al.  A machine learning approach to computer-aided molecular design , 1991, J. Comput. Aided Mol. Des..

[147]  Hiromasa Kaneko,et al.  Ring system-based chemical graph generation for de novo molecular design , 2016, Journal of Computer-Aided Molecular Design.

[148]  Danishuddin,et al.  Descriptors and their selection methods in QSAR analysis: paradigm for drug design. , 2016, Drug discovery today.

[149]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[150]  Forbes J. Burkowski,et al.  A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem , 2009, J. Cheminformatics.

[151]  Jean-Loup Faulon,et al.  Developing a methodology for an inverse quantitative structure-activity relationship using the signature molecular descriptor. , 2002, Journal of molecular graphics & modelling.

[152]  Jean-Loup Faulon,et al.  Designing Novel Polymers with Targeted Properties Using the Signature Molecular Descriptor , 2006, J. Chem. Inf. Model..