Molecular Descriptors for Structure-Activity Applications: A Hands-On Approach.

Molecular descriptors capture diverse parts of the structural information of molecules and they are the support of many contemporary computer-assisted toxicological and chemical applications. After briefly introducing some fundamental concepts of structure-activity applications (e.g., molecular descriptor dimensionality, classical vs. fingerprint description, and activity landscapes), this chapter guides the readers through a step-by-step explanation of molecular descriptors rationale and application. To this end, the chapter illustrates a case study of a recently published application of molecular descriptors for modeling the activity on cytochrome P450.

[1]  Max Dobler,et al.  Multi-conformational Ligand Representation in 4D-QSAR: Reducing the Bias Associated with Ligand Alignment , 2000 .

[2]  D. Butina,et al.  Performance of Kier-Hall E-state descriptors in quantitative structure activity relationship (QSAR) studies of multifunctional molecules. , 2004, Molecules.

[3]  A M Richard,et al.  An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling$ , 2016, SAR and QSAR in environmental research.

[4]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[5]  R. Guha The ups and downs of structure-activity landscapes. , 2011, Methods in molecular biology.

[6]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[7]  Maykel Cruz-Monteagudo,et al.  Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? , 2014, Drug discovery today.

[8]  Desire L. Massart,et al.  Classification and Regression Trees-Studies of HIV Reverse Transcriptase Inhibitors , 2004, J. Chem. Inf. Model..

[9]  Tatiana Nikolskaya,et al.  Prediction of Organ Toxicity Endpoints by QSAR Modeling Based on Precise Chemical‐Histopathology Annotations , 2012, Chemical biology & drug design.

[10]  Roberto Todeschini,et al.  Investigating the mechanisms of bioconcentration through QSAR classification trees. , 2016, Environment international.

[11]  S. Ekins,et al.  In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling , 2007, British journal of pharmacology.

[12]  J. Drewe,et al.  Editor’s Highlight: Identification of Any Structure-Specific Hepatotoxic Potential of Different Pyrrolizidine Alkaloids Using Random Forests and Artificial Neural Networks , 2017, Toxicological sciences : an official journal of the Society of Toxicology.

[13]  Paola Gramatica,et al.  Screening and ranking of POPs for global half-life: QSAR approaches for prioritization based on molecular structure. , 2007, Environmental science & technology.

[14]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[15]  Lemont B. Kier,et al.  An Electrotopological-State Index for Atoms in Molecules , 1990, Pharmaceutical Research.

[16]  A. Sabljic,et al.  QSAR models for estimating properties of persistent organic pollutants required in evaluation of their environmental fate and risk. , 2001, Chemosphere.

[17]  Anna Rybinska,et al.  Geometry optimization method versus predictive ability in QSPR modeling for ionic liquids , 2016, Journal of Computer-Aided Molecular Design.

[18]  Marjan Vračko,et al.  QSAR Models for Reproductive Toxicity and Endocrine Disruption Activity , 2010, Molecules.

[19]  Petra Schneider,et al.  Chemically Advanced Template Search (CATS) for Scaffold-Hopping and Prospective Target Prediction for ‘Orphan’ Molecules , 2013, Molecular informatics.

[20]  Anthony E. Klon,et al.  Library Fingerprints: A Novel Approach to the Screening of Virtual Libraries , 2007, J. Chem. Inf. Model..

[21]  Roberto Todeschini,et al.  A new concept of higher-order similarity and the role of distance/similarity measures in local classification methods , 2016 .

[22]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[23]  F. Gonzalez,et al.  Role of cytochromes P450 in chemical toxicity and oxidative stress: studies with CYP2E1. , 2005, Mutation research.

[24]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[25]  Yovani Marrero Ponce Total and local (atom and atom type) molecular quadratic indices: significance interpretation, comparison to other molecular descriptors, and QSPR/QSAR applications. , 2004, Bioorganic & medicinal chemistry.

[26]  Max Dobler,et al.  5D-QSAR: the key for simulating induced fit? , 2002, Journal of medicinal chemistry.

[27]  D. Steinberg CART: Classification and Regression Trees , 2009 .

[28]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[29]  Cheng Sun,et al.  Predictive insight into the relationship between AhR binding property and toxicity of polybrominated diphenyl ethers by PLS-derived QSAR. , 2012, Toxicology letters.

[30]  A. Munro,et al.  What makes a P450 tick? , 2013, Trends in biochemical sciences.

[31]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[32]  Gergana Dimitrova,et al.  A Stepwise Approach for Defining the Applicability Domain of SAR and QSAR Models , 2005, J. Chem. Inf. Model..

[33]  R. Todeschini,et al.  Multivariate Analysis of Molecular Descriptors , 2012 .

[34]  D. Rognan Chemogenomic approaches to rational drug design , 2007, British journal of pharmacology.

[35]  Mathias Wawer,et al.  Navigating structure-activity landscapes. , 2009, Drug discovery today.

[36]  R Todeschini,et al.  A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas) , 2015, SAR and QSAR in environmental research.

[37]  Roberto Todeschini,et al.  Reshaped Sequential Replacement for variable selection in QSPR: comparison with other reference methods , 2014 .

[38]  A. Svenson,et al.  The importance of outlier detection and training set selection for reliable environmental QSAR predictions. , 2006, Chemosphere.

[39]  Karina Martinez-Mayorga,et al.  Chemoinformatics-applications in food chemistry. , 2009, Advances in food and nutrition research.

[40]  Rajarshi Guha,et al.  Development of QSAR Models To Predict and Interpret the Biological Activity of Artemisinin Analogues , 2004, J. Chem. Inf. Model..

[41]  Hanna Geppert,et al.  Advances in 2D fingerprint similarity searching , 2010, Expert opinion on drug discovery.

[42]  Ruili Huang,et al.  CERAPP: Collaborative Estrogen Receptor Activity Prediction Project , 2016, Environmental health perspectives.

[43]  Roberto Todeschini,et al.  Application of the Weighted Power-Weakness Ratio (wPWR) as a Fusion Rule in Ligand–Based Virtual Screening , 2016 .

[44]  D. West Introduction to Graph Theory , 1995 .

[45]  Kimito Funatsu,et al.  Exhaustive Structure Generation for Inverse‐QSPR/QSAR , 2010, Molecular informatics.

[46]  Roberto Todeschini,et al.  Quantitative Structure − Activity Relationship Models for Ready Biodegradability of Chemicals , 2013 .

[47]  Johann Gasteiger,et al.  The Coding of the Three-Dimensional Structure of Molecules by Molecular Transforms and Its Application to Structure-Spectra Correlations and Studies of Biological Activity , 1996, J. Chem. Inf. Comput. Sci..

[48]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[49]  A. C. Brown,et al.  V.—On the Connection between Chemical Constitution and Physiological Action. Part. I.—On the Physiological Action of the Salts of the Ammonium Bases, derived from Strychnia, Brucia, Thebaia, Codeia, Morphia, and Nicotia , 1870, Transactions of the Royal Society of Edinburgh.

[50]  Z. Deng,et al.  Bridging chemical and biological space: "target fishing" using 2D and 3D molecular descriptors. , 2006, Journal of medicinal chemistry.

[51]  Takeaki Uno,et al.  Algorithm for Advanced Canonical Coding of Planar Chemical Structures That Considers Stereochemical and Symmetric Information , 2007, J. Chem. Inf. Model..

[52]  F. Gonzalez,et al.  Role of human cytochromes P450 in the metabolic activation of chemical carcinogens and toxins. , 1994, Drug metabolism reviews.

[53]  Roberto Todeschini,et al.  Beware of Unreliable Q2! A Comparative Study of Regression Metrics for Predictivity Assessment of QSAR Models , 2016, J. Chem. Inf. Model..

[54]  Roberto Todeschini,et al.  A QSTR-Based Expert System to Predict Sweetness of Molecules , 2017, Front. Chem..

[55]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[56]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[57]  C. Eadsforth,et al.  Development of a chronic fish toxicity model for predicting sub-lethal NOEC values for non-polar narcotics , 2014, SAR and QSAR in environmental research.

[58]  Horvath Dragos,et al.  Predicting the predictability: a unified approach to the applicability domain problem of QSAR models. , 2009, Journal of chemical information and modeling.

[59]  Roberto Todeschini,et al.  Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions , 2013, Journal of Cheminformatics.

[60]  Daniel Stoffler,et al.  Putting the available chemical space to the fingertips of our scientists , 2012, Journal of Cheminformatics.

[61]  L S McCarty,et al.  Residue-based interpretation of toxicity and bioconcentration QSARs from aquatic bioassays: polar narcotic organics. , 1992, Ecotoxicology and environmental safety.

[62]  G. Klebe,et al.  Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. , 1994, Journal of medicinal chemistry.

[63]  Noel M. O'Boyle Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI , 2012, Journal of Cheminformatics.

[64]  John D. Walker,et al.  Quantitative structure–activity relationships (QSARs) in toxicology: a historical perspective , 2003 .

[65]  Lemont B. Kier,et al.  Molecular Similarity Based on Novel Atom-Type Electrotopological State Indices , 1995, J. Chem. Inf. Comput. Sci..

[66]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[67]  M. Sweeney,et al.  Human health effects after exposure to 2,3,7,8-TCDD , 2000, Food additives and contaminants.

[68]  H Briem,et al.  Multiple-conformation and protonation-state representation in 4D-QSAR: the neurokinin-1 receptor system. , 2000, Journal of medicinal chemistry.

[69]  Weida Tong,et al.  Receptor-Mediated Toxicity: QSARs for Estrogen Receptor Binding and Priority Setting of Potential Estrogenic Endocrine Disruptors , 2004 .

[70]  Davide Ballabio,et al.  A MATLAB toolbox for Principal Component Analysis and unsupervised exploration of data structure , 2015 .

[71]  M. C. Newman,et al.  The practice of structure activity relationships (SAR) in toxicology. , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[72]  Roberto Todeschini,et al.  QSAR models for bioconcentration: is the increase in the complexity justified by more accurate predictions? , 2015, Chemosphere.

[73]  J. Bajorath,et al.  Activity landscape representations for structure-activity relationship analysis. , 2010, Journal of medicinal chemistry.

[74]  Ruili Huang,et al.  Comprehensive Characterization of Cytochrome P450 Isozyme Selectivity across Chemical Libraries , 2009, Nature Biotechnology.

[75]  Davide Ballabio,et al.  Multivariate comparison of classification performance measures , 2017 .

[76]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[77]  I. Rusyn,et al.  Use of in Vitro HTS-Derived Concentration–Response Data as Biological Descriptors Improves the Accuracy of QSAR Models of in Vivo Toxicity , 2010, Environmental health perspectives.

[78]  John D. Walker,et al.  Use of QSARs in international decision-making frameworks to predict ecologic effects and environmental fate of chemical substances. , 2003, Environmental health perspectives.

[79]  F. Guengerich,et al.  Cytochrome P450s and other enzymes in drug metabolism and toxicity , 2006, The AAPS Journal.

[80]  Paul Watson,et al.  Naïve Bayes Classification Using 2D Pharmacophore Feature Triplet Vectors , 2008, J. Chem. Inf. Model..

[81]  Roberto Todeschini,et al.  In Silico Prediction of Cytochrome P450-Drug Interaction: QSARs for CYP3A4 and CYP2C9 , 2016, International journal of molecular sciences.

[82]  Guo-Li Shen,et al.  Modified particle swarm optimization algorithm for variable selection in MLR and PLS modeling: QSAR studies of antagonism of angiotensin II antagonists. , 2004, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[83]  Horst Spielmann,et al.  Animal testing and alternative approaches for the human health risk assessment under the proposed new European chemicals regulation , 2004, Archives of Toxicology.

[84]  Roberto Todeschini,et al.  Reshaped Sequential Replacement algorithm: an efficient approach to variable selection , 2014 .

[85]  Consonni,et al.  Chemometrics in QSAR , 2009 .

[86]  Roberto Todeschini,et al.  Prediction of Acute Aquatic Toxicity toward Daphnia Magna by using the GA-kNN Method , 2014, Alternatives to laboratory animals : ATLA.

[87]  M. Schwab,et al.  Cytochrome P450 enzymes in drug metabolism: regulation of gene expression, enzyme activities, and impact of genetic variation. , 2013, Pharmacology & therapeutics.

[88]  Development of Quantitative Structure–Activity Relationship Models for Predicting Chronic Toxicity of Substituted Benzenes to Daphnia Magna , 2016, Bulletin of Environmental Contamination and Toxicology.

[89]  D. E. Patterson,et al.  Crossvalidation, Bootstrapping, and Partial Least Squares Compared with Multiple Regression in Conventional QSAR Studies , 1988 .

[90]  A. Hopfinger,et al.  Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism , 1997 .

[91]  Nicholas Ball,et al.  Use of category approaches, read-across and (Q)SAR: general considerations. , 2013, Regulatory toxicology and pharmacology : RTP.

[92]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[93]  Roger A. Sayle,et al.  Get Your Atoms in Order - An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm , 2015, J. Chem. Inf. Model..

[94]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[95]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[96]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies. Use of the autocorrelation descriptor in the qsar study of two non-narcotic analgesic series , 1984 .

[97]  Lars Carlsen,et al.  QSARs for Prioritizing PBT Substances to Promote Pollution Prevention , 2003 .

[98]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[99]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[100]  Gerard J. Kleywegt,et al.  A chemogenomics view on protein-ligand spaces , 2009, BMC Bioinformatics.

[101]  Enrique Fernández-Blanco,et al.  Drug discovery and design for complex diseases through QSAR computational methods. , 2010, Current pharmaceutical design.

[102]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[103]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[104]  Roberto Todeschini,et al.  Matrix‐based Molecular Descriptors for Prospective Virtual Compound Screening , 2017, Molecular informatics.

[105]  Subhash C. Basak,et al.  Use of Topostructural, Topochemical, and Geometric Parameters in the Prediction of Vapor Pressure: A Hierarchical QSAR Approach , 1997, J. Chem. Inf. Comput. Sci..

[106]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[107]  I. Jolliffe Principal Component Analysis and Factor Analysis , 1986 .

[108]  Miklos Feher,et al.  Novel 2D Fingerprints for Ligand-Based Virtual Screening , 2006, J. Chem. Inf. Model..

[109]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[110]  R. Peterson,et al.  2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) Toxicity during Early Life Stage Development of Lake Trout (Salvelinus namaycush) , 1991 .

[111]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[112]  Jürgen Bajorath,et al.  Extending the Activity Cliff Concept: Structural Categorization of Activity Cliffs and Systematic Identification of Different Types of Cliffs in the ChEMBL Database , 2012, J. Chem. Inf. Model..

[113]  H. Wiener,et al.  Influence of Interatomic Forces on Paraffin Properties , 1947 .

[114]  Scott E Belanger,et al.  Development of acute toxicity quantitative structure activity relationships (QSAR) and their use in linear alkylbenzene sulfonate species sensitivity distributions. , 2016, Chemosphere.

[115]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[116]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[117]  A. Vedani,et al.  Combining protein modeling and 6D-QSAR. Simulating the binding of structurally diverse ligands to the estrogen receptor. , 2005, Journal of medicinal chemistry.

[118]  Petra Schneider,et al.  Comparison of correlation vector methods for ligand-based similarity searching , 2003, J. Comput. Aided Mol. Des..

[119]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[120]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies: system of atomic contributions for the calculation of the n-octanol/water partition coefficients , 1984 .

[121]  John R. Platt,et al.  Influence of Neighbor Bonds on Additive Bond Properties in Paraffins , 1947 .

[122]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[123]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[124]  Subhash C. Basak,et al.  NEIGHBORHOOD COMPLEXITIES AND SYMMETRY OF CHEMICAL GRAPHS AND THEIR BIOLOGICAL APPLICATIONS , 1984 .

[125]  M C Nicklaus,et al.  Conformational changes of small molecules binding to proteins. , 1995, Bioorganic & medicinal chemistry.

[126]  G. Lu,et al.  QSARs for the Chronic Toxicity of Halogenated Benzenes to Bacteria in Natural Waters , 2005, Bulletin of environmental contamination and toxicology.

[127]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[128]  Roberto Todeschini,et al.  Structure/Response Correlations and Similarity/Diversity Analysis by GETAWAY Descriptors, 1. Theory of the Novel 3D Molecular Descriptors , 2002, J. Chem. Inf. Comput. Sci..