Clustering Time-Series Gene Expression Data with Unequal Time Intervals

Clustering gene expression data given in terms of time-series is a challenging problem that imposes its own particular constraints, namely exchanging two or more time points is not possible as it would deliver quite different results, and also it would lead to erroneous biological conclusions. We have focused on issues related to clustering gene expression temporal profiles, and devised a novel algorithm for clustering gene temporal expression profile microarray data. The proposed clustering method introduces the concept of profile alignment which is achieved by minimizing the area between two aligned profiles. The overall pattern of expression in the time-series context is accomplished by applying agglomerative clustering combined with profile alignment, and finding the optimal number of clusters by means of a variant of a clustering index, which can effectively decide upon the optimal number of clusters for a given dataset. The effectiveness of the proposed approach is demonstrated on two well-known datasets, yeast and serum, and corroborated with a set of pre-clustered yeast genes, which show a very high classification accuracy of the proposed method, though it is an unsupervised scheme.

[1]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .

[2]  T Petrie,et al.  Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[3]  T. Petrie Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[4]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[5]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[6]  Shyamal D. Peddada,et al.  Confidence Interval Estimation Subject to Order Restrictions , 1994 .

[7]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[8]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[12]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[13]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[15]  Youyong Zhu,et al.  Genetic diversity and disease control in rice , 2000, Nature.

[16]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[17]  S. Peddada,et al.  Tests for Order Restrictions in Binary Data , 2001, Biometrics.

[18]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[19]  S. Falkow,et al.  Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[21]  E. Lobenhofer,et al.  Regulation of DNA Replication Fork Genes by 17β-Estradiol , 2002 .

[22]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  E. Lobenhofer,et al.  Regulation of DNA replication fork genes by 17beta-estradiol. , 2002, Molecular endocrinology.

[25]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[26]  Ben van Ommen,et al.  Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach. , 2003, Biochemical pharmacology.

[27]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[28]  Shyamal D. Peddada,et al.  Gene Selection and Clustering for Time-course and Dose-response Microarray Experiments Using Order-restricted Inference , 2003, Bioinform..

[29]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[30]  Sorin Drăghici,et al.  Data Analysis Tools for DNA Microarrays , 2003 .

[31]  Ana Conesa,et al.  maSigPro: a Method to Identify Significantly Differential Expression Profiles in Time-Course Microarray Experiments , 2006, Spanish Bioinformatics Conference.

[32]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[33]  Frank Klawonn,et al.  Clustering of unevenly sampled gene expression time-series data , 2005, Fuzzy Sets Syst..

[34]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[35]  Laurent Bréhélin,et al.  Clustering Gene Expression Series with Prior Knowledge , 2005, WABI.

[36]  C. S. Möller-Leveta,et al.  Clustering of unevenly sampled gene expression time-series data , 2005 .

[37]  Ataul Bari,et al.  A New Profile Alignment Method for Clustering Gene Expression Data , 2006, Canadian Conference on AI.

[38]  Ataul Bari,et al.  Clustering temporal gene expression data with unequal time intervals , 2007, 2007 2nd Bio-Inspired Models of Network, Information and Computing Systems.

[39]  Philippe Besse,et al.  Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives , 2007, EURASIP J. Bioinform. Syst. Biol..

[40]  Leon G. Higley,et al.  Forensic Entomology: An Introduction , 2009 .

[41]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.