Finding explained groups of time-course gene expression profiles with predictive clustering trees.

In biology, analyzing time course data is usually a two-step process, beginning with clustering of similar temporal profiles. After the initial clustering, depending on the expert's knowledge, descriptions of the clusters are elucidated (e.g., Gene Ontology terms that are enriched in the clusters). In this paper, we investigate the application of so-called predictive clustering trees (PCTs) for the analysis of time series data. PCTs are a part of a more general framework of predictive clustering, which unifies clustering and prediction. Their advantage over usual clustering approaches is that they partition the time course data into homogeneous clusters while at the same time providing symbolic descriptions of the clusters. We evaluate our approach on multiple yeast microarray time series datasets. Each dataset records the change over time in the expression level of yeast genes as a response to a specific change in environmental conditions. We demonstrate that PCTs are able to cluster genes with similar temporal profiles, yield a predictive model of the temporal profiles of genes based on a cluster prototype, and provide cluster descriptions, all in a single step.

[1]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[2]  Bernard Ženko,et al.  Learning Predictive Clustering Rules , 2005, Informatica.

[3]  Paola Sebastiani,et al.  Clustering Short Gene Expression Profiles , 2006, RECOMB.

[4]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[5]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[6]  Claudia Angelini,et al.  Time-course analysis of genome-wide gene expression data from hormone-responsive human breast cancer cells , 2008, BMC Bioinformatics.

[7]  Luís Torgo,et al.  A Comparative Study of Reliable Error Estimators for Pruning Regression Trees , 2007 .

[8]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[9]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[10]  D. Stephens,et al.  A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes , 2006 .

[11]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[12]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Kyuseok Shim,et al.  Building Decision Trees with Constraints , 2001 .

[14]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[15]  Paola Sebastiani,et al.  Conditional clustering of temporal expression profiles , 2008, BMC Bioinformatics.

[16]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[17]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[19]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[21]  Shinichi Morishita,et al.  Constrained clusters of gene expression profiles with pathological features , 2004, Bioinform..

[22]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[23]  Saso Dzeroski,et al.  Constraint Based Induction of Multi-objective Regression Trees , 2005, KDID.

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[26]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[27]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[28]  HandlJulia,et al.  Computational cluster validation in post-genomic data analysis , 2005 .