Practical considerations for active machine learning in drug discovery.

Active machine learning enables the automated selection of the most valuable next experiments to improve predictive modelling and hasten active retrieval in drug discovery. Although a long established theoretical concept and introduced to drug discovery approximately 15 years ago, the deployment of active learning technology in the discovery pipelines across academia and industry remains slow. With the recent re-discovered enthusiasm for artificial intelligence as well as improved flexibility of laboratory automation, active learning is expected to surge and become a key technology for molecular optimizations. This review recapitulates key findings from previous active learning studies to highlight the challenges and opportunities of applying adaptive machine learning to drug discovery. Specifically, considerations regarding implementation, infrastructural integration, and expected benefits are discussed. By focusing on these practical aspects of active learning, this review aims at providing insights for scientists planning to implement active learning workflows in their discovery pipelines.

[1]  Lars Carlsson,et al.  Accurate Hit Estimation for Iterative Screening Using Venn-ABERS Predictors , 2019, J. Chem. Inf. Model..

[2]  Robert F Murphy,et al.  An active role for machine learning in drug development. , 2011, Nature chemical biology.

[3]  William P. Janzen,et al.  Review: Advances in Improving the Quality and Flexibility of Compound Management , 2009, Journal of biomolecular screening.

[4]  Luc De Raedt,et al.  Active Learning for High Throughput Screening , 2008, Discovery Science.

[5]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[6]  Alán Aspuru-Guzik,et al.  Phoenics: A Bayesian Optimizer for Chemistry , 2018, ACS central science.

[7]  Ajay N. Jain,et al.  Iterative Refinement of a Binding Pocket Model: Active Computational Steering of Lead Optimization , 2012, Journal of medicinal chemistry.

[8]  Ulrike von Luxburg,et al.  Feasibility of Active Machine Learning for Multiclass Compound Classification , 2016, J. Chem. Inf. Model..

[9]  Gisbert Schneider,et al.  Automating drug discovery , 2017, Nature Reviews Drug Discovery.

[10]  G. Schneider,et al.  Active learning for computational chemogenomics. , 2017, Future medicinal chemistry.

[11]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.

[12]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Daniel Reker,et al.  Small Random Forest Models for Effective Chemogenomic Active Learning , 2017 .

[14]  Holger Fröhlich,et al.  Predicting Potent Compounds via Model-Based Global Optimization , 2013, J. Chem. Inf. Model..

[15]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[16]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[17]  Bowen Li,et al.  Designing compact training sets for data-driven molecular property prediction through optimal exploitation and exploration , 2019, Molecular Systems Design & Engineering.

[18]  Byoung-Tak Zhang,et al.  Neural networks that teach themselves through genetic discovery of novel examples , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[19]  A. Bender,et al.  Analysis of Iterative Screening with Stepwise Compound Selection Based on Novartis In-house HTS Data. , 2016, ACS chemical biology.

[20]  Peter Ertl,et al.  Artificial intelligence in chemistry and drug design , 2020, Journal of Computer-Aided Molecular Design.

[21]  Álvaro Cortés Cabrera,et al.  Active learning strategies with COMBINE analysis: new tricks for an old dog , 2018, Journal of Computer-Aided Molecular Design.

[22]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[23]  Gisbert Schneider,et al.  Active-learning strategies in computer-assisted drug discovery. , 2015, Drug discovery today.

[24]  Darren V. S. Green,et al.  BRADSHAW: a system for automated molecular design , 2019, Journal of Computer-Aided Molecular Design.

[25]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.

[26]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[27]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[28]  Leroy Cronin,et al.  Controlling an organic synthesis robot with machine learning to search for new reactivity , 2018, Nature.

[29]  Hans-Joachim Böhm,et al.  A guide to drug discovery: Hit and lead generation: beyond high-throughput screening , 2003, Nature Reviews Drug Discovery.

[30]  Maria F. Sassano,et al.  Automated design of ligands to polypharmacological profiles , 2012, Nature.

[31]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[32]  Thorsten Meinl,et al.  Maximum-Score Diversity Selection for Early Drug Discovery , 2011, J. Chem. Inf. Model..

[33]  Devin P Sullivan,et al.  Active machine learning-driven experimentation to determine compound effects on protein patterns , 2016, eLife.

[34]  Michael Eisenstein,et al.  Active machine learning helps drug hunters tackle biology , 2020, Nature Biotechnology.

[35]  Elizabeth Farrant,et al.  Rapid discovery of a novel series of Abl kinase inhibitors by application of an integrated microfluidic synthesis and screening platform. , 2013, Journal of medicinal chemistry.

[36]  Lorenz M Mayr,et al.  Novel trends in high-throughput screening. , 2009, Current opinion in pharmacology.

[37]  Daniel Reker,et al.  Selection of Informative Examples in Chemogenomic Datasets. , 2018, Methods in molecular biology.

[38]  Jan Ramon,et al.  Active learning for primary drug screening , 2008 .

[39]  Ryo Shimizu,et al.  Virtual Screening System for Finding Structurally Diverse Hits by Active Learning , 2008, J. Chem. Inf. Model..

[40]  P Schneider,et al.  Multi-objective active machine learning rapidly improves structure–activity models and reveals new protein–protein interaction inhibitors† †Electronic supplementary information (ESI) available: Details about computational comparisons and all screening results. See DOI: 10.1039/c5sc04272k , 2016, Chemical science.

[41]  Alpha A. Lee,et al.  Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning , 2019, Chemical science.

[42]  Péter Horváth,et al.  modAL: A modular active learning framework for Python , 2018, ArXiv.

[43]  Christin Rakers,et al.  Chemogenomic Active Learning's Domain of Applicability on Small, Sparse qHTS Matrices: A Study Using Cytochrome P450 and Nuclear Hormone Receptor Families , 2018, ChemMedChem.

[44]  Jonathan Grizou,et al.  Human versus Robots in the Discovery and Crystallization of Gigantic Polyoxometalates , 2017, Angewandte Chemie.

[45]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[46]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[47]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.