Methods for mining HTS data.

Data mining is a fast-growing field that is finding application across a wide range of industries. HTS is a crucial part of the drug discovery process at most large pharmaceutical companies. Accurate analysis of HTS data is, therefore, vital to drug discovery. Given the large quantity of data generated during an HTS, and the importance of analyzing those data effectively, it is unsurprising that data-mining techniques are now increasingly applied to HTS data analysis. Taking a broad view of both the HTS process and the data-mining process, we review recent literature that describes the application of data-mining techniques to HTS data.

[1]  Andrew Smellie,et al.  Visualization and Interpretation of High Content Screening Data , 2006, J. Chem. Inf. Model..

[2]  Meir Glick,et al.  Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers , 2006, J. Chem. Inf. Model..

[3]  Andreas Sewing,et al.  Fluorescence readouts in HTS: no gain without pain? , 2003, Drug discovery today.

[4]  Ramesh Padmanabha,et al.  HTS quality control and data analysis: a process to maximize information from a high-throughput screen. , 2005, Combinatorial chemistry & high throughput screening.

[5]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[6]  Tudor I. Oprea,et al.  Post-High-Throughput Screening Analysis: An Empirical Compound Prioritization Scheme , 2005, Journal of biomolecular screening.

[7]  G. Rishton Reactive compounds and in vitro false positives in HTS , 1997 .

[8]  Peter Meier,et al.  Key aspects of the Novartis compound collection enhancement project for the compilation of a comprehensive chemogenomics drug discovery screening collection. , 2005, Current topics in medicinal chemistry.

[9]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[10]  James A. Lumley,et al.  Compound selection and filtering in library design , 2005 .

[11]  B. Shoichet,et al.  High-throughput assays for promiscuous inhibitors , 2005, Nature chemical biology.

[12]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[13]  U Schopfer,et al.  Molecular diversity management strategies for building and enhancement of diverse and focused lead discovery compound screening collections. , 2004, Combinatorial chemistry & high throughput screening.

[14]  Andreas Sewing,et al.  Improving the Design and Analysis of High-Throughput Screening Technology Comparison Experiments Using Statistical Modeling , 2006, Journal of biomolecular screening.

[15]  R. Morphy,et al.  Designed multiple ligands. An emerging drug discovery paradigm. , 2005, Journal of medicinal chemistry.

[16]  Alexander Alanine,et al.  Lead generation--enhancing the success of drug discovery by investing in the hit to lead process. , 2003, Combinatorial chemistry & high throughput screening.

[17]  Stephen D. Pickett,et al.  Research Papers) Design of a Compound Screening Collection for use in High Throughput Screening , 2004 .

[18]  K L Spear,et al.  Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. , 2001, Journal of combinatorial chemistry.

[19]  Andreas Sewing,et al.  Evaluating Real-Life High-Throughput Screening Data , 2005, Journal of biomolecular screening.

[20]  Alex N. Kalos,et al.  Data mining in the chemical industry , 2005, KDD '05.

[21]  Rudi Verbeeck,et al.  Outlier Mining in High Throughput Screening Experiments , 2002, Journal of biomolecular screening.

[22]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[23]  Jun Feng,et al.  PowerMV: A Software Environment for Molecular Viewing, Descriptor Generation, Data Analysis and Hit Evaluation , 2005, J. Chem. Inf. Model..

[24]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[25]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[26]  Jing Li,et al.  Novel Statistical Approach for Primary High-Throughput Screening Hit Selection , 2005, J. Chem. Inf. Model..

[27]  A. Fliri,et al.  Biological spectra analysis: Linking biological activity profiles to molecular structure. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Hannu Toivonen,et al.  Data Mining In Bioinformatics , 2005 .

[29]  Peter Willett,et al.  Comparison of fragment weighting schemes for substructural analysis , 1989 .

[30]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[31]  Jürgen Bajorath,et al.  Chemoinformatics : concepts, methods, and tools for drug discovery , 2004 .

[32]  D. Searls,et al.  Managing genomic and proteomic knowledge. , 2005, Drug discovery today. Technologies.

[33]  G. Harper,et al.  The reduced graph descriptor in virtual screening and data-driven clustering of high-throughput screening data. , 2004, Journal of chemical information and computer sciences.

[34]  S. Fox,et al.  High-Throughput Screening: Searching for Higher Productivity , 2004, Journal of biomolecular screening.

[35]  Stuart L. Schreiber,et al.  Identifying Biologically Active Compound Classes Using Phenotypic Screening Data and Sampling Statistics , 2005, J. Chem. Inf. Model..

[36]  C. Chung,et al.  Effect of detergent on "promiscuous" inhibitors. , 2003, Journal of medicinal chemistry.

[37]  B. Schölkopf,et al.  Heterogeneous Data Comparison and Gene Selection with Kernel Canonical Correlation Analysis , 2004 .

[38]  Frank K Brown,et al.  Practical Approaches to Efficient Screening: Information-Rich Screening Protocol , 2004, Journal of biomolecular screening.

[39]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[40]  Andrew R. Leach,et al.  An Introduction to Chemoinformatics , 2003 .

[41]  Edgar Jacoby,et al.  Library design for fragment based screening. , 2005, Current topics in medicinal chemistry.

[42]  G. Maggiora,et al.  Hit-directed nearest-neighbor searching. , 2005, Journal of medicinal chemistry.

[43]  A. Hopkins,et al.  Navigating chemical space for biology and medicine , 2004, Nature.

[44]  Erik Johansson,et al.  Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm) , 2004, Analytical and bioanalytical chemistry.

[45]  Christophe G. Lambert,et al.  Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning , 1999, J. Chem. Inf. Comput. Sci..

[46]  Michael S Lajiness,et al.  Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[47]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[48]  Michael S Lajiness,et al.  Strategies for the identification and generation of informative compound sets. , 2004, Methods in molecular biology.

[49]  Hans-Joachim Böhm,et al.  A guide to drug discovery: Hit and lead generation: beyond high-throughput screening , 2003, Nature Reviews Drug Discovery.

[50]  Andrew Lemon,et al.  Medicinal chemistry tools: making sense of HTS data. , 2006, European journal of medicinal chemistry.

[51]  John Steele,et al.  Drug-like properties: guiding principles for design - or chemical prejudice? , 2004, Drug discovery today. Technologies.

[52]  Edgar Jacoby,et al.  Annotating and mining the ligand-target chemogenomics knowledge space , 2004 .

[53]  Tina Garyantes,et al.  The Confirmation Rate of Primary Hits: A Predictive Model , 2002, Journal of biomolecular screening.

[54]  Mark E Schurdak,et al.  Affinity-based screening techniques for enhancing lead discovery. , 2004, Current opinion in drug discovery & development.

[55]  Bert Gunter,et al.  Improved Statistical Methods for Hit Selection in High-Throughput Screening , 2003, Journal of biomolecular screening.