Classification using functional data analysis for temporal gene expression data

MOTIVATION Temporal gene expression profiles provide an important characterization of gene function, as biological systems are predominantly developmental and dynamic. We propose a method of classifying collections of temporal gene expression curves in which individual expression profiles are modeled as independent realizations of a stochastic process. The method uses a recently developed functional logistic regression tool based on functional principal components, aimed at classifying gene expression curves into known gene groups. The number of eigenfunctions in the classifier can be chosen by leave-one-out cross-validation with the aim of minimizing the classification error. RESULTS We demonstrate that this methodology provides low-error-rate classification for both yeast cell-cycle gene expression profiles and Dictyostelium cell-type specific gene expression patterns. It also works well in simulations. We compare our functional principal components approach with a B-spline implementation of functional discriminant analysis for the yeast cell-cycle data and simulations. This indicates comparative advantages of our approach which uses fewer eigenfunctions/base functions. The proposed methodology is promising for the analysis of temporal gene expression data and beyond. AVAILABILITY MATLAB programs are available upon request.

[1]  Gareth M. James,et al.  Functional linear discriminant analysis for irregularly sampled curves , 2001 .

[2]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[3]  H. Muller,et al.  Generalized functional linear models , 2005, math/0505638.

[4]  D. Wilkinson Gene Expression Patterns , 2002, Brain Research.

[5]  Robert E. Weiss,et al.  An Analysis of Paediatric Cd4 Counts for Acquired Immune Deficiency Syndrome Using Flexible Random Curves , 1996 .

[6]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[7]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[8]  M. Zabeau,et al.  Genome-wide expression analysis of plant cell cycle modulated genes. , 2001, Current opinion in plant biology.

[9]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  G. Gibson,et al.  Microarray Analysis , 2020, Definitions.

[11]  Ming Yan,et al.  A simple statistical model for depicting the cdc-15 synchronized yeast cell cycle-regulated gene expression data , 2002 .

[12]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[13]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[14]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[15]  K. Tamura,et al.  Metabolic engineering of plant alkaloid biosynthesis. Proc Natl Acad Sci U S A , 2001 .

[16]  Gad Shaulsky,et al.  Gene expression patterns in Dictyostelium using microarrays. , 2002, Protist.

[17]  Gareth M. James Generalized linear models with functional predictors , 2002 .

[18]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[19]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[20]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[21]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[22]  H. Müller,et al.  Shrinkage Estimation for Functional Principal Component Scores with Application to the Population Kinetics of Plasma Folate , 2003, Biometrics.

[23]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[26]  H. Ressom,et al.  Clustering gene expression data using adaptive double self-organizing map. , 2003, Physiological genomics.

[27]  B. Silverman,et al.  Estimating the mean and covariance structure nonparametrically when the data are curves , 1991 .

[28]  D. Murray,et al.  Genome wide oscillations in expression – Wavelet analysis of time series data from yeast expression arrays uncovers the dynamic architecture of phenotype , 2004, Molecular Biology Reports.

[29]  Fang-Xiang Wu,et al.  A Genetic K-means Clustering Algorithm Applied to Gene Expression Data , 2003, Canadian Conference on AI.

[30]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[31]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[32]  H. McAdams,et al.  Global analysis of the genetic network controlling a bacterial cell cycle. , 2000, Science.

[33]  Peter Hall,et al.  A Functional Data—Analytic Approach to Signal Discrimination , 2001, Technometrics.

[34]  William B. Capra,et al.  An Accelerated-Time Model for Response Curves , 1997 .

[35]  Colin O. Wu,et al.  Nonparametric Mixed Effects Models for Unequally Sampled Noisy Curves , 2001, Biometrics.

[36]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[37]  Ronald W. Davis,et al.  Transcriptional regulation and function during the human cell cycle , 2001, Nature Genetics.

[38]  Haixu Tang,et al.  A New Estimator of Significance of Correlation in Time Series Data , 2001, J. Comput. Biol..

[39]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[40]  Hans-Georg Ller,et al.  Functional Modelling and Classification of Longitudinal Data. , 2005 .

[41]  E. Brown,et al.  Genomic analysis of gene expression in C. elegans. , 2000, Science.

[42]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[43]  Jarkko Venna,et al.  Analysis and visualization of gene expression data using Self-Organizing Maps , 2002, Neural Networks.

[44]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Scott A. Rifkin,et al.  Microarray analysis of Drosophila development during metamorphosis. , 1999, Science.

[46]  William Stafford Noble,et al.  Kernel hierarchical gene clustering from microarray expression data , 2003, Bioinform..

[47]  Xin Zhao,et al.  The functional data analysis view of longitudinal data , 2004 .

[48]  R. Firtel,et al.  Control of spatial patterning and cell-type proportioning in Dictyostelium. , 1999, Seminars in cell & developmental biology.

[49]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[50]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[51]  P. Lio’,et al.  Periodic gene expression program of the fission yeast cell cycle , 2004, Nature Genetics.

[52]  S. J. Press,et al.  Choosing between Logistic Regression and Discriminant Analysis , 1978 .

[53]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[54]  T Hwa,et al.  Expression patterns of cell-type-specific genes in Dictyostelium. , 2001, Molecular biology of the cell.

[55]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[56]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[57]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[59]  Xueli Liu,et al.  Modes and clustering for time-warped gene expression profile data , 2003, Bioinform..

[60]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[61]  L. Wong,et al.  Identification of cell cycle-regulated genes in fission yeast. , 2005, Molecular biology of the cell.