A new approach to analyzing gene expression time series data

We present algorithms for time-series gene expression analysis that permit the principled estimation of unobserved time-points, clustering, and dataset alignment. Each expression profile is modeled as a cubic spline (piecewise polynomial) that is estimated from the observed data and every time point influences the overall smooth expression curve. We constrain the spline coefficients of genes in the same class to have similar expression patterns, while also allowing for gene specific parameters. We show that unobserved time-points can be reconstructed using our method with 10-15% less error when compared to previous best methods. Our clustering algorithm operates directly on the continuous representations of gene expression profiles, and we demonstrate that this is particularly effective when applied to non-uniformly sampled data. Our continuous alignment algorithm also avoids difficulties encountered by discrete approaches. In particular, our method allows for control of the number of degrees of freedom of the warp through the specification of parameterized functions, which helps to avoid overfitting. We demonstrate that our algorithm produces stable low-error alignments on real expression data and further show a specific application to yeast knockout data that produces biologically meaningful results.

[1]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[4]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[5]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[7]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Gareth M. James,et al.  Functional linear discriminant analysis for irregularly sampled curves , 2001 .

[9]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[10]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[11]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[12]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[13]  Tao Jiang,et al.  Algorithmic Approaches to Clustering Gene Expression Data , 2002 .

[14]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[15]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Rice,et al.  Smoothing spline models for the analysis of nested and crossed samples of curves , 1998 .

[17]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[18]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..