Clustering of Time-Course Gene Expression Data

Microarray experiments have been used to measure genes’ expression levels under different cellular conditions or along certain time course. Initial attempts to interpret these data begin with grouping genes according to similarity in their expression profiles. The widely adopted clustering techniques for gene expression data include hierarchical clustering, self-organizing maps, and K-means clustering. Bayesian networks and neural networks have also been applied to gene clustering. Sharan & Shamir [3] provided a survey on this topic. Clustering techniques typically discover the inherent structure of the genes expression profiles based on some similarity measures. The clustering results largely depend on how the similarity measure corresponds to the biological correlation between genes. Before reliable conclusion about biological functions can be drawn from the data, the gene clusters obtained from microarray analysis must be investigated with respect to known biological roles of those clusters. The current analysis of whole-genome expression focuses on relationships based on global correlation over a whole time-course, identifying clusters of genes whose expression levels simultaneously rise and fall. However, genes may be regulated by different regulators in a long time course. Co-regulating in part of the long time course does not guarantee a global similarity in gene profiles. Biclustering of microarray gene expression data has recently been introduced by Chen & Church [1] as a means to discover sets of genes that co-expressed in only part of the experiment conditions under study. Essentially, overlapping in gene clusters is allowed, and many subtle gene clusters are revealed. Since then, several other algorithms have been developed to bicluster gene expression data [4]. However, existing biclustering algorithms do not consider the differences between time-series gene expression data and multi-condition gene expression data. The relations between time points are ignored, and the time points are clustered independently. It is marginally biologically meaningful if two genes show similar expression pattern in non-consecutive time points. It is therefore necessary to preserve the time locality in time-course gene expression data.

[1]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[2]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[3]  Roded Sharan,et al.  Algorithmic approaches to clustering gene expression data , 2001 .

[4]  Michael Q. Zhang,et al.  Current Topics in Computational Molecular Biology , 2002 .