Nonpher: computational method for design of hard-to-synthesize structures

AbstractIn cheminformatics, machine learning methods are typically used to classify chemical compounds into distinctive classes such as active/nonactive or toxic/nontoxic. To train a classifier, a training data set must consist of examples from both positive and negative classes. While a biological activity or toxicity can be experimentally measured, another important molecular property, a synthetic feasibility, is a more abstract feature that can’t be easily assessed. In the present paper, we introduce Nonpher, a computational method for the construction of a hard-to-synthesize virtual library. Nonpher is based on a molecular morphing algorithm in which new structures are iteratively generated by simple structural changes, such as the addition or removal of an atom or a bond. In Nonpher, molecular morphing was optimized so that it yields structures not overly complex, but just right hard-to-synthesize. Nonpher results were compared with SAscore and dense region (DR), other two methods for the generation of hard-to-synthesize compounds. Random forest classifier trained on Nonpher data achieves better results than models obtained using SAscore and DR data.

[1]  Haruki Nakamura,et al.  Prediction of Synthetic Accessibility Based on Commercially Available Compound Databases , 2014, J. Chem. Inf. Model..

[2]  Yutaka Endo,et al.  Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[3]  Robert C. Glen,et al.  Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[4]  Pascal Bonnet,et al.  Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. , 2012, European journal of medicinal chemistry.

[5]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[6]  Daniel Svozil,et al.  Molpher: a software framework for systematic chemical space exploration , 2014, Journal of Cheminformatics.

[7]  George Karypis,et al.  Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods , 2010, J. Chem. Inf. Model..

[8]  G. Klebe Virtual ligand screening: strategies, perspectives and limitations , 2006, Drug Discovery Today.

[9]  A. Schuffenhauer,et al.  Complex molecules: do they add value? , 2005, Current opinion in chemical biology.

[10]  Valerie J. Gillet,et al.  SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility , 1995 .

[11]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[12]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[13]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[14]  Johann Gasteiger,et al.  Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[15]  Johann Gasteiger,et al.  Computer‐Assisted Planning of Organic Syntheses: The Second Generation of Programs , 1996 .

[16]  Jonathan D. Hirst,et al.  Contemporary QSAR Classifiers Compared , 2007, J. Chem. Inf. Model..

[17]  Robert D. Clark,et al.  DPRESS: Localizing estimates of predictive uncertainty , 2009, J. Cheminformatics.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[20]  Wei Zhou,et al.  TCMSP: a database of systems pharmacology for drug discovery from herbal medicines , 2014, Journal of Cheminformatics.

[21]  Michael S Lajiness,et al.  Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[22]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[23]  H. W. Whitlock On the Structure of Total Synthesis of Complex Natural Products. , 1999 .

[24]  Markus Hartenfeller,et al.  De novo drug design. , 2010, Methods in molecular biology.

[25]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[26]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  J C Baber,et al.  Predicting synthetic accessibility: application in drug discovery and development. , 2004, Mini reviews in medicinal chemistry.

[29]  Lin-Li Li,et al.  RASA: A Rapid Retrosynthesis-Based Scoring Method for the Assessment of Synthetic Accessibility of Drug-like Molecules , 2011, J. Chem. Inf. Model..

[30]  Meir Glick,et al.  Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery , 2012, PloS one.

[31]  Steven H. Bertz,et al.  The first general index of molecular complexity , 1981 .

[32]  Tudor I. Oprea,et al.  Rapid Evaluation of Synthetic and Molecular Complexity for in Silico Chemistry , 2005, J. Chem. Inf. Model..

[33]  René Barone,et al.  A New and Simple Approach to Chemical Complexity. Application to the Synthesis of Natural Products , 2001, J. Chem. Inf. Comput. Sci..