Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties

ABSTRACT Introduction: The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data. Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31 selected datasets were assessed using specific criteria derived in this study. Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed.

[1]  Daniel Neagu,et al.  Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology , 2016, Soft Comput..

[2]  H. Modarress,et al.  Linear and nonlinear quantitative structure-property relationship modelling of skin permeability , 2014, SAR and QSAR in environmental research.

[3]  Cheng Luo,et al.  In silico ADME/T modelling for rational drug design , 2015, Quarterly Reviews of Biophysics.

[4]  Jan A. Kors,et al.  Consistency of systematic chemical identifiers within and between small-molecule databases , 2012, Journal of Cheminformatics.

[5]  Marc C. Nicklaus,et al.  Experimental and Chemoinformatics Study of Tautomerism in a Database of Commercially Available Screening Samples , 2016, J. Chem. Inf. Model..

[6]  Taravat Ghafourian,et al.  Decision trees to characterise the roles of permeability and solubility on the prediction of oral absorption. , 2015, European journal of medicinal chemistry.

[7]  Nikolaos Kavantzas,et al.  Simple physicochemical properties related with lipophilicity, polarity, molecular size and ionization status exert significant impact on the transfer of drugs and chemicals into human breast milk , 2016, Expert opinion on drug metabolism & toxicology.

[8]  Worth Andrew,et al.  Review of QSAR Models and Software Tools for predicting Biokinetic Properties , 2010 .

[9]  Alex Avdeef,et al.  Absorption and Drug Development: Solubility, Permeability, and Charge State , 2003 .

[10]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[11]  Wendy A. Warr,et al.  Representation of chemical structures , 2011 .

[12]  Marlene T. Kim,et al.  Critical Evaluation of Human Oral Bioavailability for Pharmaceutical Drugs by Using Various Cheminformatics Approaches , 2013, Pharmaceutical Research.

[13]  Taravat Ghafourian,et al.  Validated models for predicting skin penetration from different vehicles. , 2010, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[14]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[15]  Paul Tobback,et al.  GUIDANCE ON SUBMISSIONS FOR FOOD ADDITIVE EVALUATIONS , 2001 .

[16]  Hongwu Qin,et al.  EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE , 2015 .

[17]  M. Cronin,et al.  Pitfalls in QSAR , 2003 .

[18]  Andreas Bender,et al.  The challenges involved in modeling toxicity data in silico: a review. , 2012, Current pharmaceutical design.

[19]  Bob Safford,et al.  The impact of vehicle on the relative potency of skin-sensitizing chemicals in the local lymph node assay , 2008, Cutaneous and ocular toxicology.

[20]  D. Macdonald,et al.  Development and Evaluation of Consensus-Based Sediment Quality Guidelines for Freshwater Ecosystems , 2000, Archives of environmental contamination and toxicology.

[21]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[22]  Yu Zong Chen,et al.  Prediction of Cytochrome P450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines , 2005, J. Chem. Inf. Model..

[23]  U. Tillmann,et al.  A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data. , 1997, Regulatory toxicology and pharmacology : RTP.

[24]  Junmei Wang,et al.  Structure – ADME relationship : still a long way to go ? , 2008 .

[25]  Fernando Aguilar,et al.  Re‐evaluation of stannous chloride (E 512) as food additive , 2018, EFSA journal. European Food Safety Authority.

[26]  M T D Cronin,et al.  Structure-Based Methods for the Prediction of the Dominant P450 Enzyme in Human Drug Biotransformation: Consideration of CYP3A4, CYP2C9, CYP2D6 , 2005, SAR and QSAR in environmental research.

[27]  J C Madden,et al.  Structure-based modelling in reproductive toxicology: (Q)SARs for the placental barrier , 2007, SAR and QSAR in environmental research.

[28]  Chanita Kuseva,et al.  Towards AOP application--implementation of an integrated approach to testing and assessment (IATA) into a pipeline tool for skin sensitization. , 2014, Regulatory toxicology and pharmacology : RTP.

[29]  Vladimir B Bajic,et al.  In silico toxicology: computational methods for the prediction of chemical toxicity , 2016, Wiley interdisciplinary reviews. Computational molecular science.

[30]  Paulo Paixão,et al.  Tissue-to-blood distribution coefficients in the rat: utility for estimation of the volume of distribution in man. , 2013, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[31]  Hai Pham-The,et al.  The Use of Rule‐Based and QSPR Approaches in ADME Profiling: A Case Study on Caco‐2 Permeability , 2013, Molecular informatics.

[32]  Feixiong Cheng,et al.  In silico ADMET prediction: recent advances, current challenges and future trends. , 2013, Current topics in medicinal chemistry.

[33]  Alexander Tropsha,et al.  QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation. , 2006, Journal of medicinal chemistry.

[34]  Franco Lombardo,et al.  Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 670 Drug Compounds , 2008, Drug Metabolism and Disposition.

[35]  Adriano D. Andricopulo,et al.  PK/DB: database for pharmacokinetic properties and predictive in silico ADME models , 2008, Bioinform..

[36]  Marc C Nicklaus,et al.  Tautomerism of Warfarin: Combined Chemoinformatics, Quantum Chemical, and NMR Investigation. , 2015, The Journal of organic chemistry.

[37]  Kjell Johnson,et al.  Porcine Brain Microvessel Endothelial Cells as an in Vitro Model to Predict in Vivo Blood-Brain Barrier Permeability , 2006, Drug Metabolism and Disposition.

[38]  Philippe Vayer,et al.  Toward in silico structure-based ADMET prediction in drug discovery. , 2012, Drug discovery today.

[39]  Read-Across Assessment Framework (RAAF) , 2017 .

[40]  Gordon M. Crippen,et al.  pKa Prediction of Monoprotic Small Molecules the SMARTS Way , 2008, J. Chem. Inf. Model..

[41]  Judith C. Madden,et al.  Methods for assigning confidence to toxicity data with multiple values--Identifying experimental outliers. , 2014, The Science of the total environment.

[42]  Samiul Hasan,et al.  ADME SARfari: comparative genomics of drug metabolizing systems , 2015, Bioinform..

[43]  M. T. D. Cronin Chapter 3:Finding the Data to Develop and Evaluate (Q)SARs and Populate Categories for Toxicity Prediction , 2010 .

[44]  Peter Murray-Rust,et al.  Minimum information about a bioactive entity (MIABE) , 2011, Nature Reviews Drug Discovery.

[45]  Luis Pinheiro,et al.  A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling , 2012, J. Chem. Inf. Model..

[46]  Tingjun Hou,et al.  Advances in computationally modeling human oral bioavailability. , 2015, Advanced drug delivery reviews.

[47]  Alexander Tropsha,et al.  Cheminformatics analysis of assertions mined from literature that describe drug-induced liver injury in different species. , 2010, Chemical research in toxicology.

[48]  Prashant S Khakar Two-dimensional (2D) in silico models for absorption, distribution, metabolism, excretion and toxicity (ADME/T) in drug discovery. , 2010, Current topics in medicinal chemistry.

[49]  Andrew M Davis,et al.  Predictive ADMET studies, the challenges and the opportunities. , 2004, Current opinion in chemical biology.

[50]  Judith C. Madden,et al.  Data Quality in the Human and Environmental Health Sciences: Using Statistical Confidence Scoring to Improve QSAR/QSPR Modeling , 2015, J. Chem. Inf. Model..

[51]  J. C. Madden Chapter 21:Toxicokinetic Considerations in Predicting Toxicity , 2010 .

[52]  John C Dearden,et al.  In silico prediction of ADMET properties: how far have we come? , 2007, Expert opinion on drug metabolism & toxicology.

[53]  E J Lien Structures, properties and disposition of drugs. , 1985, Progress in drug research. Fortschritte der Arzneimittelforschung. Progres des recherches pharmaceutiques.

[54]  A. Tropsha,et al.  Human Intestinal Transporter Database: QSAR Modeling and Virtual Profiling of Drug Uptake, Efflux and Interactions , 2013, Pharmaceutical Research.

[55]  Aldert H Piersma,et al.  Strategies for the optimisation of in vivo experiments in accordance with the 3Rs philosophy. , 2012, Regulatory toxicology and pharmacology : RTP.

[56]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[57]  Wei Zhang,et al.  Recent advances in computational prediction of drug absorption and permeability in drug discovery. , 2006, Current medicinal chemistry.

[58]  G Beck,et al.  Evaluation of human intestinal absorption data and subsequent derivation of a quantitative structure-activity relationship (QSAR) with the Abraham descriptors. , 2001, Journal of pharmaceutical sciences.

[59]  J. C. Madden Chapter 5:Sources of Chemical Information, Toxicity Data and Assessment of Their Quality , 2013 .

[60]  Z R Li,et al.  Quantitative structure-pharmacokinetic relationships for drug clearance by using statistical learning methods. , 2006, Journal of molecular graphics & modelling.

[61]  Emilio Benfenati,et al.  A Protocol to Select High Quality Datasets of Ecotoxicity Values for Pesticides , 2004, Journal of environmental science and health. Part. B, Pesticides, food contaminants, and agricultural wastes.

[62]  Tingjun Hou,et al.  Development of Reliable Aqueous Solubility Models and Their Application in Druglike Analysis , 2007, J. Chem. Inf. Model..

[63]  K. Okumura,et al.  Permeability of selected drugs and chemicals across the blood-testis barrier of the rat. , 1975, The Journal of pharmacology and experimental therapeutics.

[64]  Mick J. Ridley,et al.  Data governance in predictive toxicology: A review , 2011, J. Cheminformatics.

[65]  Michael S Roberts,et al.  Molecular size as the main determinant of solute maximum flux across the skin. , 2004, The Journal of investigative dermatology.

[66]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach , 2004, J. Chem. Inf. Model..

[67]  Matthew D. Segall,et al.  ADMET Property Prediction: The State of the Art and Current Challenges , 2007 .

[68]  Ulf Norinder,et al.  Classification of Inhibitors of Hepatic Organic Anion Transporting Polypeptides (OATPs): Influence of Protein Expression on Drug–Drug Interactions , 2012, Journal of medicinal chemistry.

[69]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[70]  Malazizi,et al.  A Data Quality Assessment Algorithm with Applications in Predictive Toxicology , 2022 .

[71]  Tudor I. Oprea,et al.  A novel approach for predicting P-glycoprotein (ABCB1) inhibition using molecular interaction fields. , 2011, Journal of medicinal chemistry.

[72]  Tingjun Hou,et al.  ADMET Evaluation in Drug Discovery. 11. PharmacoKinetics Knowledge Base (PKKB): A Comprehensive Database of Pharmacokinetic and Toxic Properties for Drugs , 2012, J. Chem. Inf. Model..

[73]  Helmut Segner,et al.  Data quality assessment for in silico methods: A survey of approaches and needs , 2010 .

[74]  Noel M. O'Boyle Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI , 2012, Journal of Cheminformatics.

[75]  T W Schultz,et al.  A strategy for structuring and reporting a read-across prediction of toxicity. , 2015, Regulatory toxicology and pharmacology : RTP.

[76]  Weida Tong,et al.  Development and Validation of Decision Forest Model for Estrogen Receptor Binding Prediction of Chemicals Using Large Data Sets. , 2015, Chemical research in toxicology.

[77]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[78]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[79]  Supa Hannongbua,et al.  In-silico ADME models: a general assessment of their utility in drug discovery applications. , 2011, Current topics in medicinal chemistry.

[80]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[81]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[82]  Kim Z. Travis,et al.  Application of toxicokinetics to improve chemical risk assessment: implications for the use of animals. , 2009, Regulatory toxicology and pharmacology : RTP.

[83]  Thomas Hartung,et al.  "ToxRTool", a new tool to assess the reliability of toxicological data. , 2009, Toxicology letters.

[84]  Jeremy G Frey,et al.  Cheminformatics and the Semantic Web: adding value with linked data and enhanced provenance , 2013, Wiley interdisciplinary reviews. Computational molecular science.

[85]  Ann Richard,et al.  ACToR--Aggregated Computational Toxicology Resource. , 2008, Toxicology and applied pharmacology.

[86]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[87]  I. Cuthill,et al.  Reporting : The ARRIVE Guidelines for Reporting Animal Research , 2010 .

[88]  Ralph Kühne,et al.  Comparative Analysis of QSAR Models for Predicting pKa of Organic Oxygen Acids and Nitrogen Bases from Molecular Structure , 2010, J. Chem. Inf. Model..

[89]  Antony J. Williams,et al.  Ambiguity of non-systematic chemical identifiers within and between small-molecule databases , 2015, Journal of Cheminformatics.

[90]  Philip Rowe,et al.  Essential Statistics for the Pharmaceutical Sciences , 2007 .

[91]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[92]  Alexander Golbraikh,et al.  Data Set Modelability by QSAR , 2014, J. Chem. Inf. Model..

[93]  W. Pardridge,et al.  Influx of testosterone-binding globulin (TeBG) and TeBG-bound sex steroid hormones into rat testis and prostate. , 1988, The Journal of clinical endocrinology and metabolism.

[94]  Jaina Mistry,et al.  A rapid computational filter for cytochrome P450 1A2 inhibition potential of compound libraries. , 2005, Journal of medicinal chemistry.

[95]  K R Przybylak,et al.  Assessing toxicological data quality: basic principles, existing schemes and current limitations , 2012, SAR and QSAR in environmental research.

[96]  Paulo Paixão,et al.  Prediction of the in vitro intrinsic clearance determined in suspensions of human hepatocytes by using artificial neural networks. , 2010, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.