Data Quality Assurance and Statistical Analysis of High Throughput Screenings for Drug Discovery

Abstract: High throughput screening (HTS) is an important tool in modern drug discovery processes. Many recent, successful drugs can be traced back to HTS [1]. This platform has proliferated from pharmaceutical industry to national labs (e.g. NIH Molecular Libraries Screening Centers Network), and to academic institutions. Besides throughput improvements from thousand molecules in early times to multimillion molecules now, it has been adapted to increasingly sophisticated biological assays such as high content imaging. The vast amount of biological data from these screens presents a significant challenge for identifying interesting molecules in various biological processes. Due to the intrinsic noise of HTS and complex biological processes in most assays, HTS results need careful analysis to identify reliable hit molecules. Various data normalization and analysis algorithms have been developed by different groups over the years. In this chapter, we briefly describe some common issues encountered in HTS and related analysis.

[1]  Fabio Gasparri,et al.  An overview of cell phenotypes in HCS: limitations and advantages , 2009, Expert opinion on drug discovery.

[2]  Paul Labute,et al.  A probabilistic approach to high throughput drug discovery. , 2002, Combinatorial chemistry & high throughput screening.

[3]  Tudor I. Oprea,et al.  Pursuing the leadlikeness concept in pharmaceutical research. , 2004, Current opinion in chemical biology.

[4]  R. König,et al.  A probability-based approach for the analysis of large-scale RNAi screens , 2007, Nature Methods.

[5]  Meir Glick,et al.  Enrichment of Extremely Noisy High-Throughput Screening Data Using a Naïve Bayes Classifier , 2004, Journal of biomolecular screening.

[6]  Kaisheng Chen,et al.  In silico gene function prediction using ontology-based pattern identification , 2005, Bioinform..

[7]  Bert Gunter,et al.  Statistical and Graphical Methods for Quality Control Determination of High-Throughput Screening Data , 2003, Journal of biomolecular screening.

[8]  Meir Glick,et al.  Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases , 2006, J. Chem. Inf. Model..

[9]  B. Shoichet Screening in a spirit haunted world. , 2006, Drug discovery today.

[10]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[11]  Robert Nadon,et al.  An efficient method for the detection and elimination of systematic error in high-throughput screening , 2007, Bioinform..

[12]  Xiaohua Douglas Zhang A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. , 2007, Genomics.

[13]  Wolfgang Huber,et al.  Analysis of cell-based RNAi screens , 2006, Genome Biology.

[14]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[15]  David B Volkin,et al.  Application of a high-throughput screening procedure with PEG-induced precipitation to compare relative protein solubility during formulation development with IgG1 monoclonal antibodies. , 2011, Journal of pharmaceutical sciences.

[16]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[17]  Isabel Coma,et al.  Statistics and decision making in high-throughput screening. , 2009, Methods in molecular biology.

[18]  Jörg Hüser,et al.  High‐throughput Screening for Targeted Lead Discovery , 2006 .

[19]  Frank K Brown,et al.  Practical Approaches to Efficient Screening: Information-Rich Screening Protocol , 2004, Journal of biomolecular screening.

[20]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[21]  Yingyao Zhou,et al.  The Plasmodium falciparum sexual development transcriptome: a microarray analysis using ontology-based pattern identification. , 2005, Molecular and biochemical parasitology.

[22]  Amy S. Espeseth,et al.  Host Cell Factors in HIV Replication: Meta-Analysis of Genome-Wide Studies , 2009, PLoS pathogens.

[23]  M F Engels,et al.  Smart screening: approaches to efficient HTS. , 2001, Current opinion in drug discovery & development.

[24]  W. Aird,et al.  Robo4 is an effective tumor endothelial marker for antibody-drug conjugates based on the rapid isolation of the anti-Robo4 cell-internalizing antibody. , 2013, Blood.

[25]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[26]  Xiaohua Douglas Zhang,et al.  An Effective Method for Controlling False Discovery and False Nondiscovery Rates in Genome-Scale RNAi Screens , 2010, Journal of biomolecular screening.

[27]  Tudor I. Oprea,et al.  Post-High-Throughput Screening Analysis: An Empirical Compound Prioritization Scheme , 2005, Journal of biomolecular screening.

[28]  E. Schadt,et al.  Integrating siRNA and protein-protein interaction data to identify an expanded insulin signaling network. , 2009, Genome research.

[29]  Bin Chen,et al.  Gaining Insight into Off-Target Mediated Effects of Drug Candidates with a Comprehensive Systems Chemical Biology Analysis , 2009, J. Chem. Inf. Model..

[30]  Marc Ferrer,et al.  Robust statistical methods for hit selection in RNA interference high-throughput screening experiments. , 2006, Pharmacogenomics.

[31]  Gavin Harper,et al.  Process Validation and Screen Reproducibility in High-Throughput Screening , 2009, Journal of biomolecular screening.

[32]  C. Moallemi,et al.  Quantized surface complementarity diversity (QSCD): a model based on small molecule-target complementarity. , 2000, Journal of medicinal chemistry.

[33]  Francisco-Javier Gamo,et al.  Global phenotypic screening for antimalarials. , 2012, Chemistry & biology.

[34]  Alexander Alanine,et al.  Lead generation--enhancing the success of drug discovery by investing in the hit to lead process. , 2003, Combinatorial chemistry & high throughput screening.

[35]  Hans-Joachim Böhm,et al.  A guide to drug discovery: Hit and lead generation: beyond high-throughput screening , 2003, Nature Reviews Drug Discovery.

[36]  D. Swinney,et al.  How were new medicines discovered? , 2011, Nature Reviews Drug Discovery.

[37]  A. Hill,et al.  The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves , 1910 .

[38]  Paul A Johnston,et al.  Identifying Actives from HTS Data Sets , 2011, Journal of biomolecular screening.

[39]  Christopher P Austin,et al.  High-throughput screening assays for the identification of chemical probes. , 2007, Nature chemical biology.

[40]  F. Hampel A General Qualitative Definition of Robustness , 1971 .

[41]  K. Horiuchi,et al.  Innovative chemical compound microarrays for drug screening , 2006 .

[42]  Stuart L. Schreiber,et al.  Identifying Biologically Active Compound Classes Using Phenotypic Screening Data and Sampling Statistics , 2005, J. Chem. Inf. Model..

[43]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[44]  L. Makings,et al.  A FRET-based assay platform for ultra-high density drug screening of protein kinases and phosphatases. , 2002, Assay and drug development technologies.

[45]  Xiaohua Douglas Zhang,et al.  Illustration of SSMD, z Score, SSMD*, z* Score, and t Statistic for Hit Selection in RNAi High-Throughput Screens , 2011, Journal of biomolecular screening.

[46]  Linda O Narhi,et al.  High-throughput assessment of thermal and colloidal stability parameters for monoclonal antibody formulations. , 2011, Journal of pharmaceutical sciences.

[47]  Eero P. Simoncelli,et al.  Nonlinear image representation for efficient perceptual coding , 2006, IEEE Transactions on Image Processing.

[48]  Jeffrey E. Lee,et al.  Generation of Monoclonal Antibody MS17-57 Targeting Secreted Alkaline Phosphatase Ectopically Expressed on the Surface of Gastrointestinal Cancer Cells , 2013, PloS one.

[49]  D. Bojanic,et al.  Impact of high-throughput screening in biomedical research , 2011, Nature Reviews Drug Discovery.

[50]  Andreas Bender,et al.  Understanding False Positives in Reporter Gene Assays: in Silico Chemogenomics Approaches To Prioritize Cell-Based HTS Data , 2007, J. Chem. Inf. Model..

[51]  Xiaohua Douglas Zhang Genome-wide screens for effective siRNAs through assessing the size of siRNA effects , 2008 .

[52]  Amy M. Wiles,et al.  An Analysis of Normalization Methods for Drosophila RNAi Genomic Screens and Development of a Robust Validation Scheme , 2008, Journal of biomolecular screening.

[53]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[54]  Xiaohua Douglas Zhang,et al.  Determination of sample size in genome-scale RNAi screens , 2009, Bioinform..

[55]  Noel Southall,et al.  COPI Complex Is a Regulator of Lipid Homeostasis , 2008, PLoS biology.

[56]  Gary Walker,et al.  Enhancing Hit Quality and Diversity within Assay Throughput Constraints , 2005 .

[57]  Anthony E. Klon,et al.  Improved Naïve Bayesian Modeling of Numerical Data for Absorption, Distribution, Metabolism and Excretion (ADME) Property Prediction , 2006, J. Chem. Inf. Model..

[58]  W. Patrick Walters,et al.  A guide to drug discovery: Designing screens: how to make your hits a hit , 2003, Nature Reviews Drug Discovery.

[59]  Hanspeter Gubler,et al.  Methods for Statistical Analysis, Quality Assurance and Management of Primary High‐throughput Screening Data , 2006 .

[60]  Bert Gunter,et al.  Improved Statistical Methods for Hit Selection in High-Throughput Screening , 2003, Journal of biomolecular screening.

[61]  Hans-Jörg Roth,et al.  There is no such thing as 'diversity'! , 2005, Current opinion in chemical biology.

[62]  J. Baell,et al.  New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. , 2010, Journal of medicinal chemistry.

[63]  Xiaohua Douglas Zhang,et al.  A New Method with Flexible and Balanced Control of False Negatives and False Positives for Hit Selection in RNA Interference High-Throughput Screening Assays , 2007, Journal of biomolecular screening.

[64]  Andreas Sewing,et al.  Evaluating Real-Life High-Throughput Screening Data , 2005, Journal of biomolecular screening.

[65]  A. Weaver,et al.  Freshwater Forcing: Will History Repeat Itself? , 2008, Science.

[66]  Paul Labute,et al.  Binary QSAR: A New Method for the Determination of Quantitative Structure Activity Relationships , 1998, Pacific Symposium on Biocomputing.

[67]  David E Root,et al.  Detecting Spatial Patterns in Biological Array Experiments , 2003, Journal of biomolecular screening.

[68]  David G. Lambert,et al.  Drugs and receptors , 2004 .

[69]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.

[70]  James Inglese,et al.  Apparent activity in high-throughput screening: origins of compound-dependent assay interference. , 2010, Current opinion in chemical biology.

[71]  Marc Ferrer,et al.  Median Absolute Deviation to Improve Hit Selection for Genome-Scale RNAi Screens , 2008, Journal of biomolecular screening.

[72]  Peter Sommer,et al.  A novel specific edge effect correction method for RNA interference screenings , 2012, Bioinform..

[73]  D. Pereira,et al.  Origin and evolution of high throughput screening , 2007, British journal of pharmacology.

[74]  C. Brenan,et al.  Nanoliter high-throughput PCR for DNA and RNA profiling. , 2009, Methods in molecular biology.

[75]  V. Socci,et al.  Design and validation of siRNAs and shRNAs. , 2009, Current opinion in molecular therapeutics.

[76]  Christopher P Austin,et al.  A high-throughput screen for aggregation-based inhibition in a large compound library. , 2007, Journal of medicinal chemistry.

[77]  Xiaohua Douglas Zhang,et al.  A method for effectively comparing gene effects in multiple conditions in RNAi and expression-profiling research. , 2009, Pharmacogenomics.

[78]  S Stanley Young,et al.  Using recursive partitioning analysis to evaluate compound selection methods. , 2004, Methods in molecular biology.

[79]  Dongmei Liu,et al.  Quantitative assessment of hit detection and confirmation in single and duplicate high-throughput screenings. , 2008, Journal of biomolecular screening.

[80]  Lorenz M Mayr,et al.  Novel trends in high-throughput screening. , 2009, Current opinion in pharmacology.

[81]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[82]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[83]  B. Shoichet,et al.  A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. , 2002, Journal of medicinal chemistry.

[84]  Aideen Long,et al.  Statistical methods for analysis of high-throughput RNA interference screens , 2009, Nature Methods.

[85]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[86]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[87]  Anne Mai Wassermann,et al.  Biodiversity of small molecules--a new perspective in screening set selection. , 2013, Drug discovery today.

[88]  Stephen D. Pickett,et al.  Research Papers) Design of a Compound Screening Collection for use in High Throughput Screening , 2004 .

[89]  B. Shoichet,et al.  A specific mechanism of nonspecific inhibition. , 2003, Journal of medicinal chemistry.

[90]  T. Keating,et al.  Correction for Interference by Test Samples in High-Throughput Assays , 2009, Journal of biomolecular screening.

[91]  Yanli Wang,et al.  A novel method for mining highly imbalanced high-throughput screening data in PubChem , 2009, Bioinform..

[92]  V. Makarenkov,et al.  Statistical Analysis of Systematic Errors in High-Throughput Screening , 2005, Journal of biomolecular screening.

[93]  Sanjay Joshua Swamidass,et al.  Enhancing the rate of scaffold discovery with diversity-oriented prioritization , 2011, Bioinform..

[94]  Adam Yasgar,et al.  Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[95]  C. Bakal,et al.  Phosphorylation Networks Regulating JNK Activity in Diverse Genetic Backgrounds , 2008, Science.

[96]  C. Bakal,et al.  Genomic screening with RNAi: results and challenges. , 2010, Annual review of biochemistry.

[97]  Bin Zhou,et al.  Chemical and Biological Properties of Frequent Screening Hits , 2012, J. Chem. Inf. Model..

[98]  Jing Li,et al.  Novel Statistical Approach for Primary High-Throughput Screening Hit Selection , 2005, J. Chem. Inf. Model..

[99]  David E Root,et al.  A flexible data analysis tool for chemical genetic screens. , 2004, Chemistry & biology.

[100]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[101]  B. Shoichet,et al.  High-throughput assays for promiscuous inhibitors , 2005, Nature chemical biology.

[102]  Ricardo Macarrón,et al.  Design and Implementation of High Throughput Screening Assays , 2011, Molecular biotechnology.

[103]  Ramm,et al.  Imaging systems in assay screening. , 1999, Drug discovery today.

[104]  Marc Ferrer,et al.  The Use of SSMD-Based False Discovery and False Nondiscovery Rates in Genome-Scale RNAi Screens , 2010, Journal of biomolecular screening.

[105]  Kristin E. D. Coan,et al.  Promiscuous Aggregate-Based Inhibitors Promote Enzyme Unfolding , 2009, Journal of medicinal chemistry.

[106]  Peter J. Rousseeuw,et al.  Robust Distances: Simulations and Cutoff Values , 1991 .

[107]  Andreas Bender,et al.  “Plate Cherry Picking”: A Novel Semi-Sequential Screening Paradigm for Cheaper, Faster, Information-Rich Compound Selection , 2007, Journal of biomolecular screening.

[108]  S A Sundberg,et al.  High-throughput and ultra-high-throughput screening: solution- and cell-based approaches. , 2000, Current opinion in biotechnology.

[109]  Yanli Wang,et al.  PubChem BioAssay: 2014 update , 2013, Nucleic Acids Res..

[110]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[111]  Jürgen Bajorath,et al.  Polypharmacology Directed Compound Data Mining: Identification of Promiscuous Chemotypes with Different Activity Profiles and Comparison to Approved Drugs , 2010, J. Chem. Inf. Model..

[112]  N. Perrimon,et al.  Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells , 2004, Science.

[113]  Meir Glick,et al.  Streamlining lead discovery by aligning in silico and high-throughput screening. , 2006, Current opinion in chemical biology.

[114]  Christopher P Austin,et al.  Quantitative analyses of aggregation, autofluorescence, and reactivity artifacts in a screen for inhibitors of a thiol protease. , 2010, Journal of medicinal chemistry.

[115]  Lorenz M Mayr,et al.  The Future of High-Throughput Screening , 2008, Journal of biomolecular screening.

[116]  Christian N. Parker,et al.  Application of Chemoinformatics to High-Throughput Screening , 2004 .

[117]  S Stanley Young,et al.  Initial compound selection for sequential screening. , 2002, Current opinion in drug discovery & development.

[118]  Mohammad Fallahi-Sichani,et al.  Metrics other than potency reveal systematic variation in responses to cancer drugs. , 2013, Nature chemical biology.

[119]  Lirong Chen,et al.  Small Molecules Blocking the Entry of Severe Acute Respiratory Syndrome Coronavirus into Host Cells , 2004, Journal of Virology.

[120]  Jing Liu,et al.  Experimental Design and Statistical Methods for Improved Hit Detection in High-Throughput Screening , 2010, Journal of biomolecular screening.

[121]  Anthony E. Klon,et al.  Library Fingerprints: A Novel Approach to the Screening of Virtual Libraries , 2007, J. Chem. Inf. Model..

[122]  Robert Nadon,et al.  Systematic error detection in experimental high-throughput screening , 2011, BMC Bioinformatics.

[123]  Andrew I Su,et al.  HierS: hierarchical scaffold clustering using topological chemical graphs. , 2005, Journal of medicinal chemistry.

[124]  Min Xu,et al.  Hit selection with false discovery rate control in genome-scale RNAi screens , 2008, Nucleic acids research.