Clustering Gene Expression Patterns

Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O[n2[log(n)]c]. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  G. Lennon,et al.  Hybridization analyses of arrayed cDNA libraries. , 1991, Trends in genetics : TIG.

[5]  V. Agol [Internal initiation of translation in eukaryotes]. , 1991, Molekuliarnaia biologiia.

[6]  P. Pevzner,et al.  Improved chips for sequencing by hybridization. , 1991, Journal of biomolecular structure & dynamics.

[7]  Dauid F. Percy Cluster Analysis (3rd Edition) , 1994 .

[8]  J. Barker,et al.  Developmental kinetics of GAD family mRNAs parallel neurogenesis in the rat spinal cord , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[9]  Ludek Kucera,et al.  Expected Complexity of Graph Partitioning Problems , 1995, Discret. Appl. Math..

[10]  A. Blanchard,et al.  Sequence to array: Probing the genome's secrets , 1996, Nature Biotechnology.

[11]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[12]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[13]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[14]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[16]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[17]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[18]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  M. Bittner,et al.  Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. , 1998, Cancer research.

[20]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.

[22]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.