论文信息 - Nonpher: computational method for design of hard-to-synthesize structures

Nonpher: computational method for design of hard-to-synthesize structures

AbstractIn cheminformatics, machine learning methods are typically used to classify chemical compounds into distinctive classes such as active/nonactive or toxic/nontoxic. To train a classifier, a training data set must consist of examples from both positive and negative classes. While a biological activity or toxicity can be experimentally measured, another important molecular property, a synthetic feasibility, is a more abstract feature that can’t be easily assessed. In the present paper, we introduce Nonpher, a computational method for the construction of a hard-to-synthesize virtual library. Nonpher is based on a molecular morphing algorithm in which new structures are iteratively generated by simple structural changes, such as the addition or removal of an atom or a bond. In Nonpher, molecular morphing was optimized so that it yields structures not overly complex, but just right hard-to-synthesize. Nonpher results were compared with SAscore and dense region (DR), other two methods for the generation of hard-to-synthesize compounds. Random forest classifier trained on Nonpher data achieves better results than models obtained using SAscore and DR data.

Daniel Svozil | Milan Voršilák | D. Svozil | M. Voršilák | Milan Voršilák

[1] Haruki Nakamura,et al. Prediction of Synthetic Accessibility Based on Commercially Available Compound Databases , 2014, J. Chem. Inf. Model..

[2] Yutaka Endo,et al. Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[3] Robert C. Glen,et al. Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[4] Pascal Bonnet,et al. Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. , 2012, European journal of medicinal chemistry.

[5] Peter Ertl,et al. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[6] Daniel Svozil,et al. Molpher: a software framework for systematic chemical space exploration , 2014, Journal of Cheminformatics.

[7] George Karypis,et al. Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods , 2010, J. Chem. Inf. Model..

[8] G. Klebe. Virtual ligand screening: strategies, perspectives and limitations , 2006, Drug Discovery Today.

[9] A. Schuffenhauer,et al. Complex molecules: do they add value? , 2005, Current opinion in chemical biology.

[10] Valerie J. Gillet,et al. SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility , 1995 .

[11] Minoru Kanehisa,et al. KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[12] Susumu Goto,et al. KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[13] Robert P. Sheridan,et al. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[14] Johann Gasteiger,et al. Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[15] Johann Gasteiger,et al. Computer‐Assisted Planning of Organic Syntheses: The Second Generation of Programs , 1996 .

[16] Jonathan D. Hirst,et al. Contemporary QSAR Classifiers Compared , 2007, J. Chem. Inf. Model..

[17] Robert D. Clark,et al. DPRESS: Localizing estimates of predictive uncertainty , 2009, J. Cheminformatics.

[18] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[19] Ryan G. Coleman,et al. ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[20] Wei Zhou,et al. TCMSP: a database of systems pharmacology for drug discovery from herbal medicines , 2014, Journal of Cheminformatics.

[21] Michael S Lajiness,et al. Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[22] Gang Fu,et al. PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[23] H. W. Whitlock. On the Structure of Total Synthesis of Complex Natural Products. , 1999 .

[24] Markus Hartenfeller,et al. De novo drug design. , 2010, Methods in molecular biology.

[25] Brian K. Shoichet,et al. Virtual screening of chemical libraries , 2004, Nature.

[26] David Rogers,et al. Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[27] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28] J C Baber,et al. Predicting synthetic accessibility: application in drug discovery and development. , 2004, Mini reviews in medicinal chemistry.

[29] Lin-Li Li,et al. RASA: A Rapid Retrosynthesis-Based Scoring Method for the Assessment of Synthetic Accessibility of Drug-like Molecules , 2011, J. Chem. Inf. Model..

[30] Meir Glick,et al. Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery , 2012, PloS one.

[31] Steven H. Bertz,et al. The first general index of molecular complexity , 1981 .

[32] Tudor I. Oprea,et al. Rapid Evaluation of Synthetic and Molecular Complexity for in Silico Chemistry , 2005, J. Chem. Inf. Model..

[33] René Barone,et al. A New and Simple Approach to Chemical Complexity. Application to the Synthesis of Natural Products , 2001, J. Chem. Inf. Comput. Sci..