Clustering time-varying gene expression profiles using scale-space signals

The functional state of an organism is determined largely by the pattern of expression of its genes. The analysis of gene expression data from gene chips has primarily revolved around clustering and classification of the data using machine learning techniques based on the intensity of expression alone with the time-varying pattern mostly ignored. In this paper, we present a pattern recognition-based approach to capturing similarity by finding salient changes in the time-varying expression patterns of genes. Such changes can give clues about important events, such as gene regulation by cell-cycle phases, or even signal the onset of a disease. Specifically, we observe that dissimilarity between time series is revealed by the sharp twists and bends produced in a higher-dimensional curve formed from the constituent signals. Scale-space analysis is used to detect the sharp twists and turns and their relative strength with respect to the component signals is estimated to form a shape similarity measure between time profiles. A clustering algorithm is presented to cluster gene profiles using the scale-space distance as a similarity metric. Multidimensional curves formed from time series within clusters are used as cluster prototypes or indexes to the gene expression database, and are used to retrieve the functionally similar genes to a query gene profile. Extensive comparison of clustering using scale-space distance in comparison to traditional Euclidean distance is presented on the yeast genome database.

[1]  Andrew P. Witkin,et al.  Scale-space filtering: A new approach to multi-scale description , 1984, ICASSP.

[2]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[3]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .

[5]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[6]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[7]  Laura Firoiu,et al.  Clustering Time Series with Hidden Markov Models and Dynamic Time Warping , 1999 .

[8]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[9]  Sergio M. Focardi,et al.  Clustering economic and financial time series : Exploring the existence of stable correlation conditions , 2001 .

[10]  Tommi S. Jaakkola,et al.  A new approach to analyzing gene expression time series data , 2002, RECOMB '02.

[11]  Tony Lindeberg,et al.  Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention , 1993, International Journal of Computer Vision.

[12]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .