Data Mining Techniques in High Content Screening: A Survey

Advanced microscopy and corresponding image analysis have evolved in recent years as a compelling tool for studying molecular and morphological events in cells and tissues. Cell-based High-Content Screening (HCS) is an upcoming technique for the investigation of cellular processes and their alteration by multiple chemical or genetic perturbations. The analysis of the large amount of data generated in HCS experiments represents a significant challenge and is currently a bottleneck in many screening projects. This article reviews the different ways to analyse large sets of HCS data, including the questions that can be asked and the challenges in interpreting the measurements. The main data mining approaches used in HCS are image descriptors, computations, normalization, quality control methods and classification algorithms.

[1]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[2]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[3]  B Cox,et al.  Application of high-throughput screening techniques to drug discovery. , 2000 .

[4]  Z. Szallasi,et al.  Modeling the normal and neoplastic cell cycle with "realistic Boolean genetic networks": their application for understanding carcinogenesis and assessing therapeutic strategies. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[5]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[6]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.

[7]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[8]  Stephan Heyse,et al.  Comprehensive analysis of high-throughput screening data , 2002, SPIE BiOS.

[9]  Bert Gunter,et al.  Statistical and Graphical Methods for Quality Control Determination of High-Throughput Screening Data , 2003, Journal of biomolecular screening.

[10]  D L Taylor,et al.  Real-time molecular and cellular analysis: the new frontier of drug discovery. , 2001, Current opinion in biotechnology.

[11]  D. L. Taylor,et al.  Advances in high content screening for drug discovery. , 2003, Assay and drug development technologies.

[12]  A Wuensche,et al.  Genomic regulation modeled as a network with basins of attraction. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  N. Perrimon,et al.  Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells , 2004, Science.

[14]  T. Kenakin,et al.  Quantitative Molecular Pharmacology and Informatics in Drug Discovery , 2000 .

[15]  Alexander Golbraikh,et al.  Comparison of chemical databases : Analysis of molecular diversity with self Organising maps (SOM) , 1998 .

[16]  Paul A Johnston,et al.  Cellular platforms for HTS: three case studies. , 2002, Drug discovery today.

[17]  H. Himmelbauer,et al.  An endoribonuclease-prepared siRNA screen in human cells identifies genes essential for cell division , 2004, Nature.

[18]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[19]  Bianca Habermann,et al.  Genome-wide analysis of human kinases in clathrin- and caveolae/raft-mediated endocytosis , 2005, Nature.

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  V. Makarenkov,et al.  Statistical Analysis of Systematic Errors in High-Throughput Screening , 2005, Journal of biomolecular screening.

[22]  Anne E Carpenter,et al.  A Lentiviral RNAi Library for Human and Mouse Genes Applied to an Arrayed Viral High-Content Screen , 2006, Cell.

[23]  J H Zhang,et al.  Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. , 2000, Journal of combinatorial chemistry.