论文信息 - Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning

Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning

We report a statistically principled method to quantify the uncertainty of machine learning models for molecular properties prediction. We show that this uncertainty estimate can be used to judiciously design experiments.

Alpha A. Lee | Yao Zhang | A. Lee | Yao Zhang

[1] Gisbert Schneider,et al. Active-learning strategies in computer-assisted drug discovery. , 2015, Drug discovery today.

[2] Andreas Bender,et al. A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication , 2005, J. Chem. Inf. Model..

[3] William L. Jorgensen,et al. Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[4] David R Spring,et al. Rational methods for the selection of diverse screening compounds. , 2011, ACS chemical biology.

[5] Huanwang Yang,et al. D3R Grand Challenge 3: blind prediction of protein–ligand poses and affinity rankings , 2018, Journal of Computer-Aided Molecular Design.

[6] Robert P. Sheridan,et al. Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[7] James R. Brown,et al. Thousands of chemical starting points for antimalarial lead identification , 2010, Nature.

[8] David L. Mobley,et al. FreeSolv: a database of experimental and calculated hydration free energies, with input files , 2014, Journal of Computer-Aided Molecular Design.

[9] Scott Boyer,et al. Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination , 2014, J. Chem. Inf. Model..

[10] Matthew E Welsch,et al. Privileged scaffolds for library design and drug discovery. , 2010, Current opinion in chemical biology.

[11] Ovidiu Ivanciuc,et al. QSAR Comparative Study of Wiener Descriptors for Weighted Molecular Graphs , 2000, J. Chem. Inf. Comput. Sci..

[12] Igor V. Tetko,et al. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[13] D. Rogers,et al. Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[14] Alán Aspuru-Guzik,et al. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[15] R. Cramer,et al. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[16] Robert P. Sheridan,et al. Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[17] David Rogers,et al. Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[18] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[19] Vijay S. Pande,et al. Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[20] G. Schneider,et al. Active learning for computational chemogenomics. , 2017, Future medicinal chemistry.

[21] Andreas Verras,et al. Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[22] Christopher I. Bayly,et al. Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[23] John S. Delaney,et al. ESOL: Estimating Aqueous Solubility Directly from Molecular Structure , 2004, J. Chem. Inf. Model..

[24] Izhar Wallach,et al. Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy , 2017, J. Chem. Inf. Model..

[25] Jitender Verma,et al. 3D-QSAR in drug design--a review. , 2010, Current topics in medicinal chemistry.

[26] Alán Aspuru-Guzik,et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[27] A. W.,et al. Journal of chemical information and computer sciences. , 1995, Environmental science & technology.

[28] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[29] K. Friedemann Schmidt,et al. Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets , 2019, J. Chem. Inf. Model..

[30] J. Dearden,et al. QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[31] Scott Boyer,et al. Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models , 2014, J. Chem. Inf. Model..

[32] James G. Nourse,et al. Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[33] Michael J. Watts,et al. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[34] Milan Randić,et al. Generalized molecular descriptors , 1991 .

[35] Joseph Gomes,et al. MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[36] Regina Barzilay,et al. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[37] Maria Micha-Screttas,et al. Current Topics in Medicinal Chemistry. Biomedical applications of dendrimers. , 2008, Current topics in medicinal chemistry.

[38] Andrew Bell,et al. Shaping a Screening File for Maximal Lead Discovery Efficiency and Effectiveness: Elimination of Molecular Redundancy , 2012, J. Chem. Inf. Model..

[39] Andreas Bender,et al. Modelling compound cytotoxicity using conformal prediction and PubChem HTS data. , 2017, Toxicology research.

[40] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .