Iterative Clustering Algorithm for Analyzing Temporal Patterns of Gene Expression

Abstract — Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms. Keywords — Clustering, microarray experiment, temporal pattern of gene expression data . I. I NTRODUCTIONHE rapid development of microarray technologies has made it possible to monitor the expression levels of thousands of genes simultaneously [1]. These technologies have proved a boon in the biological and medical sciences, where they have assisted researchers in tackling such broad problems as tumor classification. Microarray experiments provide a wealth of information; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels, which may provide insights into genetic capacities and their interactions. Indeed, microarray experiments in cellular contexts have shown that genes with similar functions often evince similar temporal patterns of co-regulation [2], [3]. Due to the large number of genes involved in these experiments and the complexity of biological processes in general, an effective clustering algorithm for grouping genes is crucial to such studies. Clustering analysis is faced with two problems: how to determine the number of true clusters and how to evaluate the

[1]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[2]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[3]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[4]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[5]  Eytan Domany,et al.  Resampling Method for Unsupervised Estimation of Cluster Validity , 2001, Neural Computation.

[6]  Jae Won Lee,et al.  Ensemble clustering method based on the resampling similarity measure for gene expression data , 2007 .

[7]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[8]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[9]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[10]  Shyamal D. Peddada,et al.  Gene Selection and Clustering for Time-course and Dose-response Microarray Experiments Using Order-restricted Inference , 2003, Bioinform..

[11]  W. L. Ruzzo,et al.  An empirical study on Principal Component Analysis for clustering gene expression data , 2000 .

[12]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Jae Won Lee,et al.  Ensemble clustering method based on the resampling similarity measure for gene expression data. , 2007, Statistical methods in medical research.

[16]  Anil K. Jain,et al.  Bootstrap technique in cluster analysis , 1987, Pattern Recognit..

[17]  Satoru Miyano,et al.  Statistical analysis of a small set of time-ordered gene expression data using linear splines , 2002, Bioinform..