Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning

We report a statistically principled method to quantify the uncertainty of machine learning models for molecular properties prediction. We show that this uncertainty estimate can be used to judiciously design experiments.

[1]  Gisbert Schneider,et al.  Active-learning strategies in computer-assisted drug discovery. , 2015, Drug discovery today.

[2]  Andreas Bender,et al.  A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication , 2005, J. Chem. Inf. Model..

[3]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[4]  David R Spring,et al.  Rational methods for the selection of diverse screening compounds. , 2011, ACS chemical biology.

[5]  Huanwang Yang,et al.  D3R Grand Challenge 3: blind prediction of protein–ligand poses and affinity rankings , 2018, Journal of Computer-Aided Molecular Design.

[6]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[7]  James R. Brown,et al.  Thousands of chemical starting points for antimalarial lead identification , 2010, Nature.

[8]  David L. Mobley,et al.  FreeSolv: a database of experimental and calculated hydration free energies, with input files , 2014, Journal of Computer-Aided Molecular Design.

[9]  Scott Boyer,et al.  Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination , 2014, J. Chem. Inf. Model..

[10]  Matthew E Welsch,et al.  Privileged scaffolds for library design and drug discovery. , 2010, Current opinion in chemical biology.

[11]  Ovidiu Ivanciuc,et al.  QSAR Comparative Study of Wiener Descriptors for Weighted Molecular Graphs , 2000, J. Chem. Inf. Comput. Sci..

[12]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[13]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[14]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[15]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[16]  Robert P. Sheridan,et al.  Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[17]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[18]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[19]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[20]  G. Schneider,et al.  Active learning for computational chemogenomics. , 2017, Future medicinal chemistry.

[21]  Andreas Verras,et al.  Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[22]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[23]  John S. Delaney,et al.  ESOL: Estimating Aqueous Solubility Directly from Molecular Structure , 2004, J. Chem. Inf. Model..

[24]  Izhar Wallach,et al.  Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy , 2017, J. Chem. Inf. Model..

[25]  Jitender Verma,et al.  3D-QSAR in drug design--a review. , 2010, Current topics in medicinal chemistry.

[26]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[27]  A. W.,et al.  Journal of chemical information and computer sciences. , 1995, Environmental science & technology.

[28]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[29]  K. Friedemann Schmidt,et al.  Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets , 2019, J. Chem. Inf. Model..

[30]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[31]  Scott Boyer,et al.  Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models , 2014, J. Chem. Inf. Model..

[32]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[33]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Milan Randić,et al.  Generalized molecular descriptors , 1991 .

[35]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[36]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[37]  Maria Micha-Screttas,et al.  Current Topics in Medicinal Chemistry. Biomedical applications of dendrimers. , 2008, Current topics in medicinal chemistry.

[38]  Andrew Bell,et al.  Shaping a Screening File for Maximal Lead Discovery Efficiency and Effectiveness: Elimination of Molecular Redundancy , 2012, J. Chem. Inf. Model..

[39]  Andreas Bender,et al.  Modelling compound cytotoxicity using conformal prediction and PubChem HTS data. , 2017, Toxicology research.

[40]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .