Computational methods for prediction of in vitro effects of new chemical structures

BackgroundWith a constant increase in the number of new chemicals synthesized every year, it becomes important to employ the most reliable and fast in silico screening methods to predict their safety and activity profiles. In recent years, in silico prediction methods received great attention in an attempt to reduce animal experiments for the evaluation of various toxicological endpoints, complementing the theme of replace, reduce and refine. Various computational approaches have been proposed for the prediction of compound toxicity ranging from quantitative structure activity relationship modeling to molecular similarity-based methods and machine learning. Within the “Toxicology in the 21st Century” screening initiative, a crowd-sourcing platform was established for the development and validation of computational models to predict the interference of chemical compounds with nuclear receptor and stress response pathways based on a training set containing more than 10,000 compounds tested in high-throughput screening assays.ResultsHere, we present the results of various molecular similarity-based and machine-learning based methods over an independent evaluation set containing 647 compounds as provided by the Tox21 Data Challenge 2014. It was observed that the Random Forest approach based on MACCS molecular fingerprints and a subset of 13 molecular descriptors selected based on statistical and literature analysis performed best in terms of the area under the receiver operating characteristic curve values. Further, we compared the individual and combined performance of different methods. In retrospect, we also discuss the reasons behind the superior performance of an ensemble approach, combining a similarity search method with the Random Forest algorithm, compared to individual methods while explaining the intrinsic limitations of the latter.ConclusionsOur results suggest that, although prediction methods were optimized individually for each modelled target, an ensemble of similarity and machine-learning approaches provides promising performance indicating its broad applicability in toxicity prediction.

[1]  K. Giuliano Aqueous two-phase partitioning. Physical chemistry and bioanalytical applications , 1995 .

[2]  Han van de Waterbeemd,et al.  Lipophilicity in drug action and toxicology , 1996 .

[3]  R. Tennant,et al.  Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. , 1988, Mutation research.

[4]  Angelo Vedani,et al.  In Silico Toxicology in Drug Discovery — Concepts Based on Three-dimensional Models , 2009, Alternatives to laboratory animals : ATLA.

[5]  Dariusz Plewczynski BRAINSTORMING: Consensus Learning in Practice , 1970 .

[6]  Ruili Huang,et al.  Tox21Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs , 2016, Front. Environ. Sci..

[7]  John B. O. Mitchell Machine learning methods in chemoinformatics , 2014, Wiley interdisciplinary reviews. Computational molecular science.

[8]  Browne,et al.  Cross-Validation Methods. , 2000, Journal of mathematical psychology.

[9]  Kevin Cannons,et al.  An Introduction to Probabilistic Neural Networks , 2002 .

[10]  Michael R. Berthold,et al.  Constructive training of probabilistic neural networks , 1998, Neurocomputing.

[11]  Huixiao Hong,et al.  Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. , 2015, Chemical research in toxicology.

[12]  Katja Hansen,et al.  Novel machine learning methods for computational chemistry , 2012 .

[13]  D. Swinney,et al.  How were new medicines discovered? , 2011, Nature Reviews Drug Discovery.

[14]  Alex M. Clark,et al.  Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets , 2015, J. Chem. Inf. Model..

[15]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[16]  Igor I. Baskin,et al.  Machine Learning Methods for Property Prediction in Chemoinformatics: Quo Vadis? , 2012, J. Chem. Inf. Model..

[17]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[18]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[19]  H. Kubinyi Quantitative structure-activity relationships. 2. A mixed approach, based on Hansch and Free-Wilson Analysis. , 1976, Journal of medicinal chemistry.

[20]  Viv Bewick,et al.  Statistics review 13: Receiver operating characteristic curves , 2004, Critical care.

[21]  A G Renwick,et al.  Structure-based thresholds of toxicological concern--guidance for application to substances present at low levels in the diet. , 2005, Toxicology and applied pharmacology.

[22]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[23]  Filip Stefaniak,et al.  Prediction of Compounds Activity in Nuclear Receptor Signaling and Stress Pathway Assays Using Machine Learning Algorithms and Low-Dimensional Molecular Descriptors , 2015, Front. Environ. Sci..

[24]  Subhash C. Basak,et al.  A characterization of molecular similarity methods for property prediction , 1988 .

[25]  Mathias Dunkel,et al.  Molecular similarity-based predictions of the Tox21 screening outcome , 2015, Front. Environ. Sci..

[26]  Esther F. Schmid,et al.  Keynote review: Is declining innovation in the pharmaceutical industry a myth? , 2005, Drug discovery today.

[27]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[28]  A. Leo,et al.  The expanding role of quantitative structure-activity relationships (QSAR) in toxicology. , 1995, Toxicology letters.

[29]  P. Pattynama,et al.  Receiver operating characteristic (ROC) analysis: basic principles and applications in radiology. , 1998, European journal of radiology.

[30]  Jonathan D Hirst,et al.  Machine learning in virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[31]  Robert S. Boethling,et al.  Handbook of Property Estimation Methods for Chemicals : Environmental Health Sciences , 2000 .

[32]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[33]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[34]  Cynthia Rudin,et al.  Machine Learning Algorithms for Classification , 2007 .

[35]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[36]  D J Livingstone Computational techniques for the prediction of toxicity. , 1994, Toxicology in vitro : an international journal published in association with BIBRA.

[37]  Paul Krause,et al.  Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity , 2014, Journal of Cheminformatics.

[38]  Margaret S. Pepe,et al.  Receiver Operating Characteristic Methodology , 2000 .

[39]  P Willett,et al.  Similarity-based approaches to virtual screening. , 2003, Biochemical Society transactions.

[40]  Andrzej J. Bojarski,et al.  Evaluation of different machine learning methods for ligand-based virtual screening , 2011, J. Cheminformatics.

[41]  Feixiong Cheng,et al.  In silico prediction of chemical toxicity on avian species using chemical category approaches. , 2015, Chemosphere.

[42]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[43]  Steven K. Gibb Toxicity testing in the 21st century: a vision and a strategy. , 2008, Reproductive toxicology.

[44]  Zhen Li,et al.  A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model , 2008, BMC Bioinformatics.

[45]  C. Alden,et al.  Predictive Toxicology Approaches for Small Molecule Oncology Drugs , 2010, Toxicologic pathology.

[46]  Cheng Luo,et al.  In silico ADME/T modelling for rational drug design , 2015, Quarterly Reviews of Biophysics.

[47]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[48]  R. Kroes Structure-Based Thresholds of Toxicological Concern (TTC): Guidance for Application to Substances Present at Low Levels in the Diet , 2004, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.