Atom Coloring for Chemical Interpretation and De Novo Design for Molecular Design

Prediction of biological activities is valuable for finding active compounds in an effective manner, and a considerable amount of attentions has been devoted to in silico predictions in drug discovery process. For in silico predictions, quantitative structure-activity relationship (QSAR) has been widely known to be useful [1, 2]. The basic purpose of QSAR is to construct a statistical model to reveal the relationship between chemical structures and their biological activities. For the statistical analysis, chemical structures are usually represented by several kinds of chemical descriptors. The QSAR model successfully trained and scientifically validated is used for predicting the biological activities of any molecules. In addition, a physicochemical and/or mechanistic interpretation can be expected from the selected chemical descriptors in the QSAR model. As a multivariate statistical method, partial least square (PLS) is of particular interest in QSAR study [3]. PLS can analyze data with strongly collinear, noisy and numerous descriptors, and also simultaneously model several biological activities. It can also provide us several application domains and diagnostic plots as the statistical measures. We can extract the complex patterns embedded in the data set. Recently, PLS has evolved or changed for copying with sever demands from the complex data structure [4, 5]. PLS has its major restriction that only linear relationship can be extracted from data [3]. Since many structure-activity data sets are inherently nonlinear in nature, it is desirable to have a flexible method, which can model any nonlinear relationships. Recently, there has been a considerable interest in machine learning methods (ML) such as Bayesian approach [6, 7] and support vector regression (SVR) [8, 9] for nonlinear modeling. In general, since ML employs a sort of mathematical transformations of chemical descriptors, they have drawback that any correlations between the biological activity and the original descriptors should be lost. This means that a direct interpretation of the model is not easy task. A lot of papers studying ML have reported their high performances for classification and regression rates, but unfortunately they have not referred to the aspect of chemical interpretation [10]. For chemical interpretation, we employed the extended connectivity fingerprint (ECFP) as the chemical descriptor for a statistical model. ECFP can facilitate to understand what substructures are correlated with a specific biological activity. An atom score was calculated from the degree of contribution of each substructure to the model. By visualizing the atom scores with the graded-colors, an atom color mapping onto each compound was performed.

[1]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Enhancement of Comparative Molecular Binding Energy Analysis by GA‐Based PLS Method , 1999 .

[2]  Qing-Song Xu,et al.  Support vector machines and its applications in chemistry , 2009 .

[3]  Inverse QSAR Study Using Evolutionary Algorithm , 2009 .

[4]  Anne Mai Wassermann,et al.  SARANEA: A Freely Available Program To Mine Structure-Activity and Structure-Selectivity Relationship Information in Compound Data Sets , 2010, J. Chem. Inf. Model..

[5]  Kimito Funatsu,et al.  Non-linear modeling and chemical interpretation with aid of support vector machine and regression. , 2010, Current computer-aided drug design.

[6]  Rieko Arimoto,et al.  Computational models for predicting interactions with cytochrome p450 enzyme. , 2006, Current topics in medicinal chemistry.

[7]  Kimito Funatsu,et al.  Advanced PLS Techniques in Chemometrics and Their Applications to Molecular Design , 2011 .

[8]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[9]  I. Kuntz Structure-Based Strategies for Drug Design and Discovery , 1992, Science.

[10]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[11]  K. Funatsu,et al.  Tailored scoring function of Trypsin–benzamidine complex using COMBINE descriptors and support vector regression , 2008 .

[12]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[13]  Peter Gedeck,et al.  Exploiting QSAR models in lead optimization. , 2008, Current opinion in drug discovery & development.

[14]  Yu Zong Chen,et al.  Prediction of Cytochrome P450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines , 2005, J. Chem. Inf. Model..

[15]  Francis Eng Hock Tay,et al.  Feature Selection for Support Vector Machines , 2000, IDEAL.

[16]  Jürgen Bajorath,et al.  Rationalizing Three-Dimensional Activity Landscapes and the Influence of Molecular Representations on Landscape Topology and the Formation of Activity Cliffs , 2010, J. Chem. Inf. Model..

[17]  Kimito Funatsu,et al.  Exhaustive Structure Generation for Inverse‐QSPR/QSAR , 2010, Molecular informatics.

[18]  S P Gupta,et al.  A quantitative structure-activity relationship study on some matrix metalloproteinase and collagenase inhibitors. , 2003, Bioorganic & medicinal chemistry.

[19]  Jean-Pierre Doucet,et al.  Nonlinear SVM Approaches to QSPR/QSAR Studies and Drug Design , 2007 .

[20]  Qian Liu,et al.  Tagged fragment method for evolutionary structure-based de novo lead generation and optimization. , 2007, Journal of medicinal chemistry.

[21]  R. Wade,et al.  Prediction of drug binding affinities by comparative binding energy analysis. , 1997, Journal of medicinal chemistry.

[22]  Z R Li,et al.  Machine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins. , 2007, Journal of pharmaceutical sciences.

[23]  C W Yap,et al.  Regression methods for developing QSAR and QSPR models to predict compounds of specific pharmacodynamic, pharmacokinetic and toxicological properties. , 2007, Mini reviews in medicinal chemistry.

[24]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[25]  Kiyoshi Hasegawa and Kimito Funatsu Data Modeling and Chemical Interpretation of ADME Properties Using Regression and Rule Mining Techniques , 2009 .

[26]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[27]  Rajarshi Guha,et al.  On the interpretation and interpretability of quantitative structure–activity relationship models , 2008, J. Comput. Aided Mol. Des..

[28]  Qi Wang,et al.  Docking and 3D-QSAR Studies on Isatin Sulfonamide Analogues as Caspase-3 Inhibitors , 2009, J. Chem. Inf. Model..

[29]  Gary B. Fogel,et al.  A Novel In Silico Approach to Drug Discovery via Computational Intelligence , 2009, J. Chem. Inf. Model..

[30]  Ian T. Crosby,et al.  Homology Modeling and Docking Evaluation of Aminergic G Protein-Coupled Receptors , 2010, J. Chem. Inf. Model..

[31]  Kimito Funatsu,et al.  Advanced PLS Techniques in Chemoinformatics Studies. , 2010, Current computer-aided drug design.

[32]  Kimito Funatsu,et al.  Quantitative Prediction of Regioselectivity Toward Cytochrome P450/3A4 Using Machine Learning Approaches , 2010, Molecular informatics.

[33]  Philip Prathipati,et al.  Global Bayesian Models for the Prioritization of Antitubercular Agents , 2008, J. Chem. Inf. Model..