Supervised extensions of chemography approaches: case studies of chemical liabilities assessment

Chemical liabilities, such as adverse effects and toxicity, play a significant role in modern drug discovery process. In silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Herein, we propose an approach combining several classification and chemography methods to be able to predict chemical liabilities and to interpret obtained results in the context of impact of structural changes of compounds on their pharmacological profile. To our knowledge for the first time, the supervised extension of Generative Topographic Mapping is proposed as an effective new chemography method. New approach for mapping new data using supervised Isomap without re-building models from the scratch has been proposed. Two approaches for estimation of model’s applicability domain are used in our study to our knowledge for the first time in chemoinformatics. The structural alerts responsible for the negative characteristics of pharmacological profile of chemical compounds has been found as a result of model interpretation.

[1]  I. Jolliffe Principal Component Analysis , 2002 .

[2]  Christopher M. Bishop,et al.  GTM: A Principled Alternative to the Self-Organizing Map , 1996, NIPS.

[3]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[4]  Gisbert Schneider,et al.  Nonlinear dimensionality reduction and mapping of compound libraries for drug discovery. , 2012, Journal of molecular graphics & modelling.

[5]  Konstantin V. Balakin,et al.  Pharmaceutical Data Mining , 2009 .

[6]  Gabriele Cruciani,et al.  Modeling Phospholipidosis Induction: Reliability and Warnings , 2013, J. Chem. Inf. Model..

[7]  Sean Ekins,et al.  Computational mapping tools for drug discovery. , 2009, Drug discovery today.

[8]  John P. Lewis,et al.  Eurographics/ Ieee-vgtc Symposium on Visualization 2009 Selecting Good Views of High-dimensional Data Using Class Consistency , 2022 .

[9]  Tudor I. Oprea,et al.  Chemography: the Art of Navigating in Chemical Space , 2000 .

[10]  Geo. Clifford White,et al.  Comprar Pharmaceutical Data Mining: Approaches and Applications for Drug Discovery | Geo. Clifford White | 9780470196083 | Wiley , 2009 .

[11]  Kimito Funatsu,et al.  Prediction of ProteinProtein Interaction Pocket Using L‐Shaped PLS Approach and Its Visualizations by Generative Topographic Mapping , 2014, Molecular informatics.

[12]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[13]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[14]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[15]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[16]  David J. Hand,et al.  Kernel Discriminant Analysis , 1983 .

[17]  Igor V. Tetko,et al.  Development of Dimethyl Sulfoxide Solubility Models Using 163 000 Molecules: Using a Domain Applicability Metric to Select More Reliable Predictions , 2013, J. Chem. Inf. Model..

[18]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[19]  Roberto Todeschini,et al.  Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions , 2013, Journal of Cheminformatics.

[20]  Robert P. Sheridan,et al.  Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[21]  Thorsten Meinl What's new in KNIME? , 2012, Journal of Cheminformatics.

[22]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[23]  David Kombo,et al.  Making SharePoint® Chemically Aware™ , 2012, Journal of Cheminformatics.

[24]  Paola Gramatica,et al.  Daphnia and fish toxicity of (benzo)triazoles: validated QSAR models, and interspecies quantitative activity-activity modelling. , 2013, Journal of hazardous materials.

[25]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[26]  K. V. Balakin,et al.  Nonlinear Mapping Techniques for Prediction of Pharmacological Properties of Chemical Compounds , 2009 .

[27]  K. V. Balakin,et al.  NonlInear mapping techniques for predicting the pharmacological properties of chemical compounds , 2009 .

[28]  N. Kireeva,et al.  Towards in silico identification of the human ether-a-go-go-related gene channel blockers: discriminative vs. generative classification models , 2013, SAR and QSAR in environmental research.

[29]  Ruifeng Liu,et al.  Merging Applicability Domains for in Silico Assessment of Chemical Mutagenicity , 2014, J. Chem. Inf. Model..

[30]  M. Hewitt,et al.  Assessing Applicability Domains of Toxicological QSARs: Definition, Confidence in Predicted Values, and the Role of Mechanisms of Action , 2007 .

[31]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[32]  John B. O. Mitchell,et al.  Predicting the mechanism of phospholipidosis , 2012, Journal of Cheminformatics.

[33]  Xin Geng,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[35]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[36]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[37]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[38]  Ivan Rusyn,et al.  Modeling liver-related adverse effects of drugs using knearest neighbor quantitative structure-activity relationship method. , 2010, Chemical research in toxicology.

[39]  José L. Medina-Franco,et al.  Visualization of Molecular Fingerprints , 2011, J. Chem. Inf. Model..

[40]  Dimitris K. Agrafiotis,et al.  Stochastic proximity embedding , 2003, J. Comput. Chem..

[41]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[42]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[43]  Marc Strickert,et al.  Target‐Driven Subspace Mapping Methods and Their Applicability Domain Estimation , 2011, Molecular informatics.

[44]  Natalia Kireeva,et al.  Toward Navigating Chemical Space of Ionic Liquids: Prediction of Melting Points Using Generative Topographic Maps , 2012 .

[45]  Andrey M. Kazennov,et al.  Impact of distance-based metric learning on classification and visualization model performance and structure–activity landscapes , 2014, Journal of Computer-Aided Molecular Design.

[46]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[47]  Tomas Oberg A QSAR for baseline toxicity: validation, domain of application, and prediction. , 2004, Chemical research in toxicology.

[48]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[49]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[50]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[51]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[52]  David G. Stork,et al.  Computer Manual in MATLAB to Accompany Pattern Classification, Second Edition , 2004 .

[53]  Shikha Gupta,et al.  Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches. , 2013, Ecotoxicology and environmental safety.

[54]  Dimitris K Agrafiotis,et al.  A modified update rule for stochastic proximity embedding. , 2003, Journal of molecular graphics & modelling.

[55]  Héléna A. Gaspar,et al.  Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure‐Activity Modeling and Dataset Comparison , 2012, Molecular informatics.

[56]  Huafeng Xu,et al.  A self-organizing principle for learning nonlinear manifolds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Jie Yang,et al.  Support Vector Machine In Chemistry , 2004 .

[58]  Bartosz A. Grzybowski,et al.  Chemistry in Motion , 2009 .

[59]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[60]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[61]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[62]  Peter Ertl,et al.  The Molecule Cloud - compact visualization of large collections of molecules , 2012, Journal of Cheminformatics.

[63]  Wolfgang Guba,et al.  Neighborhood-preserving visualization of adaptive structure-activity landscapes: application to drug discovery. , 2011, Angewandte Chemie.

[64]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[65]  Ian T. Nabney,et al.  Data Visualization during the Early Stages of Drug Discovery , 2006, J. Chem. Inf. Model..

[66]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[67]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[68]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[69]  Peter Filzmoser,et al.  Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection. , 2013, Analytica chimica acta.

[70]  Friedrich Rippmann,et al.  Pharmacophore alignment search tool: Influence of canonical atom labeling on similarity searching , 2010, J. Comput. Chem..

[71]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[72]  Shane Weaver,et al.  The importance of the domain of applicability in QSAR modeling. , 2008, Journal of molecular graphics & modelling.

[73]  Manuela Pavan,et al.  A distance measure between models: a tool for similarity/diversity analysis of model populations , 2004 .

[74]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[75]  Ovidiu Ivanciuc,et al.  Applications of Support Vector Machines in Chemistry , 2007 .

[76]  Igor V Tetko,et al.  From Descriptors to Predicted Properties: Experimental Design by Using Applicability Domain Estimation , 2013, Alternatives to laboratory animals : ATLA.

[77]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[78]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[79]  J Devillers,et al.  Structural alerts for estimating the carcinogenicity of pesticides and biocides , 2011, SAR and QSAR in environmental research.

[80]  Peter Ertl,et al.  The graphical representation of ADME-related molecule properties for medicinal chemists. , 2011, Drug discovery today.

[81]  Ralf Der,et al.  Building Nonlinear Data Models with Self-Organizing Maps , 1996, ICANN.

[82]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[83]  H. Mewes,et al.  Can we estimate the accuracy of ADME-Tox predictions? , 2006, Drug discovery today.

[84]  C. Russom,et al.  Predicting modes of toxic action from chemical structure: Acute toxicity in the fathead minnow (Pimephales promelas) , 1997 .

[85]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[86]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[87]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.