An Economic Framework to Prioritize Confirmatory Tests after a High-Throughput Screen

How many hits from a high-throughput screen should be sent for confirmatory experiments? Analytical answers to this question are derived from statistics alone and aim to fix, for example, the false discovery rate at a predetermined tolerance. These methods, however, neglect local economic context and consequently lead to irrational experimental strategies. In contrast, the authors argue that this question is essentially economic, not statistical, and is amenable to an economic analysis that admits an optimal solution. This solution, in turn, suggests a novel tool for deciding the number of hits to confirm and the marginal cost of discovery, which meaningfully quantifies the local economic trade-off between true and false positives, yielding an economically optimal experimental strategy. Validated with retrospective simulations and prospective experiments, this strategy identified 157 additional actives that had been erroneously labeled inactive in at least one real-world screening experiment.

[1]  M. J. van der Laan,et al.  Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives , 2004, Statistical applications in genetics and molecular biology.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  P. Schoemaker The Expected Utility Model: Its Variants, Purposes, Evidence and Limitations , 1982 .

[4]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[5]  Brian M. Farley,et al.  Molecular Basis of RNA Recognition by the Embryonic Polarity Determinant MEX-5* , 2007, Journal of Biological Chemistry.

[6]  Robert D. Clark,et al.  Managing bias in ROC curves , 2008, J. Comput. Aided Mol. Des..

[7]  Sandrine Dudoit,et al.  Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate , 2004, Statistical applications in genetics and molecular biology.

[8]  H. Houthakker Revealed Preference and the Utility Function , 1950 .

[9]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[10]  Rainer Spang,et al.  twilight; a Bioconductor package for estimating the local false discovery rate , 2005, Bioinform..

[11]  S. Scheid,et al.  A stochastic downhill search algorithm for estimating the local false discovery rate , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Pierre Baldi,et al.  Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method , 2009, J. Chem. Inf. Model..

[13]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[14]  R. Sheldon,et al.  STATED PREFERENCE METHODS. AN INTRODUCTION , 1988 .

[15]  Theo P. van der Weide,et al.  A formal derivation of Heaps' Law , 2005, Inf. Sci..

[16]  David M. Rocke,et al.  Design and analysis of experiments with high throughput biological assay data. , 2004, Seminars in cell & developmental biology.

[17]  James E. J. Mills,et al.  Enhanced HTS Hit Selection via a Local Hit Rate Analysis , 2009, J. Chem. Inf. Model..

[18]  Jeffrey T Leek,et al.  The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. , 2007, Biostatistics.

[19]  Ying Wang,et al.  Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants , 2007, ISMB/ECCB.

[20]  J H Zhang,et al.  Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. , 2000, Journal of combinatorial chemistry.

[21]  Bert Gunter,et al.  Improved Statistical Methods for Hit Selection in High-Throughput Screening , 2003, Journal of biomolecular screening.

[22]  Janet L. Yellen,et al.  Commodity Bundling and the Burden of Monopoly , 1976 .

[23]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.