Hierarchical Signature Clustering for Time-series Microarray Data

Existing clustering techniques provide clusters from time series microarray data, but the distance metrics used lack interpretability for these types of data. While some previous methods are concerned with matching levels, of interest are genes that behave in the same manner but with varying levels. These are not clustered together using an Euclidean metric, and are indiscernible using a correlation metric, so we propose a more appropriate metric and modified hierarchical clustering method to highlight those genes of interest. Use of hashing and bucket sort allows for fast clustering and the hierarchical dendrogram allows for direct comparison with easily understood meaning of the distance. The method also extends well to use k-means clustering when a desired number of clusters are known.

[1]  Taesung Park,et al.  Statistical tests for identifying differentially expressed genes in time-course microarray experiments , 2003, Bioinform..

[2]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[3]  Haseong Kim,et al.  Clustering of change patterns using Fourier coefficients , 2008, Bioinform..

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Olivier Martin,et al.  Using repeated measurements to validate hierarchical gene clusters , 2008, Bioinform..

[6]  Eunseog Youn,et al.  Double feature selection and cluster analyses in mining of microarray data from cotton , 2008, BMC Genomics.

[7]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[8]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  Wlodzimierz Dobosiewicz,et al.  Sorting by Distributive Partitioning , 1978, Inf. Process. Lett..

[11]  Anindya Bhattacharya,et al.  Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles , 2008, Bioinform..

[12]  Lawrence Hunter,et al.  Trajectory Clustering: A Non-Parametric Method for Grouping Gene Expression Time Courses with Applications to Mammary Development , 2002, Pacific Symposium on Biocomputing.