Venn-Abers predictors for improved compound iterative screening in drug discovery

Iterative screening, where selected hits from a given round of screening are used to enrich a compound activity prediction model for the next iteration, enables more efficient screening campaigns. The portion of the compound library that should be screened in each iteration is often arbitrarily decided. This is because no accurate information between screening size and the number of hits to be retrieved exists. In this article, a novel method based on Venn-Abers predictors was used to determine the optimal number of compounds to be screened in order to get a desired number of hits. We found that Venn-Abers predictors provide accurate information to support a reliable and flexible decision about the portion size of the compound library that should be screened in each iteration. In addition, the method exhibited great ability in producing an enriched subset in terms of hits and their diversity.

[1]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[2]  Claudio N. Cavasotto,et al.  High-throughput and in silico screenings in drug discovery , 2009, Expert opinion on drug discovery.

[3]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[4]  Lorenz M Mayr,et al.  The Future of High-Throughput Screening , 2008, Journal of biomolecular screening.

[5]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[6]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[7]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[8]  Andreas Bender,et al.  Maximizing gain in high-throughput screening using conformal prediction , 2018, Journal of Cheminformatics.

[9]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[10]  Ricardo Macarron,et al.  Critical review of the role of HTS in drug discovery. , 2006, Drug discovery today.

[11]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[12]  Vladimir Vovk,et al.  Self-calibrating Probability Forecasting , 2003, NIPS.

[13]  Vladimir Vovk,et al.  Large-scale probabilistic predictors with and without guarantees of validity , 2015, NIPS.

[14]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[15]  Niklas Blomberg,et al.  Design of compound libraries for fragment screening , 2009, J. Comput. Aided Mol. Des..

[16]  Stefan Wetzel,et al.  The Scaffold Tree - Visualization of the Scaffold Universe by Hierarchical Scaffold Classification , 2007, J. Chem. Inf. Model..

[17]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[18]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[19]  A. Bender,et al.  Analysis of Iterative Screening with Stepwise Compound Selection Based on Novartis In-house HTS Data. , 2016, ACS chemical biology.

[20]  Vladimir Vovk,et al.  Venn-Abers Predictors , 2012, UAI.

[21]  J. Drews Drug discovery: a historical perspective. , 2000, Science.