Clustering of short time-course gene expression data with dissimilar replicates

Microarrays are used in genetics and medicine to examine large numbers of genes simultaneously through their expression levels under any condition such as a disease of interest. The information from these experiments can be enriched by following the expression levels through time and biological replicates. The purpose of this study is to propose an algorithm which clusters the genes with respect to the similarities between their behaviors through time. The algorithm is also aimed at highlighting the genes which show different behaviors between the replicates and separating the constant genes that keep their baseline expression levels throughout the study. Finally, we aim to feature cluster validation techniques to suggest a sensible number of clusters when it is not known a priori. The illustrations show that the proposed algorithm in this study offers a fast approach to clustering the genes with respect to their behavior similarities, and also separates the constant genes and the genes with dissimilar replicates without any need for pre-processing. Moreover, it is also successful at suggesting the correct number of clusters when that is not known.

[1]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[2]  Ann M. Hess,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Filtering for increased power for microarray data analysis , 2008 .

[3]  D. Hand,et al.  Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[5]  Frank Klawonn,et al.  Clustering of unevenly sampled gene expression time-series data , 2005, Fuzzy Sets Syst..

[6]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[7]  Shyamal D. Peddada,et al.  ORIOGEN: order restricted inference for ordered gene expression data , 2005, Bioinform..

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jin Hwan Do,et al.  Clustering approaches to identifying gene expression patterns from DNA microarray data. , 2008, Molecules and cells.

[10]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[11]  David Peel,et al.  The EMMIX Algorithm for the Fitting of Normal and t-Components , 1999 .

[12]  Masahiro Okamoto,et al.  Novel technique for preprocessing high dimensional time-course data from DNA microarray: mathematical model-based clustering , 2006, Bioinform..

[13]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Itziar Irigoien,et al.  Microarray Time Course Experiments: Finding Profiles , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .

[16]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[17]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[18]  José Antonio Vilar,et al.  Classifying Time Series Data: A Nonparametric Approach , 2009, J. Classif..

[19]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[21]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Marcella Corduas,et al.  Time series clustering and classification by the autoregressive metric , 2008, Comput. Stat. Data Anal..

[23]  Jianqing Fan,et al.  A Computational Approach to the Functional Clustering of Periodic Gene-Expression Profiles , 2008, Genetics.

[24]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[25]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[26]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[28]  Hongzhe Li,et al.  Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data , 2004, Bioinform..

[29]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[30]  J. R. Berrendero,et al.  Time series clustering based on forecast densities , 2006, Comput. Stat. Data Anal..

[31]  José Antonio Vilar,et al.  Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study , 2010, J. Classif..

[32]  James Bailey,et al.  A voting approach to identify a small number of highly predictive genes using multiple classifiers , 2009, BMC Bioinformatics.

[33]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[34]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[35]  G. Celeux,et al.  Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments , 2005 .

[36]  M. Bittner,et al.  Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. , 1998, Cancer research.

[37]  José Antonio Vilar,et al.  Non-linear time series clustering based on non-parametric forecast densities , 2010, Comput. Stat. Data Anal..