Statistical analysis of a small set of time-ordered gene expression data using linear splines

MOTIVATION Recently, the temporal response of genes to changes in their environment has been investigated using cDNA microarray technology by measuring the gene expression levels at a small number of time points. Conventional techniques for time series analysis are not suitable for such a short series of time-ordered data. The analysis of gene expression data has therefore usually been limited to a fold-change analysis, instead of a systematic statistical approach. METHODS We use the maximum likelihood method together with Akaike's Information Criterion to fit linear splines to a small set of time-ordered gene expression data in order to infer statistically meaningful information from the measurements. The significance of measured gene expression data is assessed using Student's t-test. RESULTS Previous gene expression measurements of the cyanobacterium Synechocystis sp. PCC6803 were reanalyzed using linear splines. The temporal response was identified of many genes that had been missed by a fold-change analysis. Based on our statistical analysis, we found that about four gene expression measurements or more are needed at each time point.

[1]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2]  M. Kanehisa,et al.  Cold‐regulated genes under control of the cold sensor Hik33 in Synechocystis , 2001, Molecular microbiology.

[3]  Satoru Miyano,et al.  Inferring qualitative relations in genetic networks and metabolic pathways , 2000, Bioinform..

[4]  Takakazu Kaneko,et al.  CyanoBase, a www database containing the complete nucleotide sequence of the genome of Synechocystis sp. strain PCC6803 , 1998, Nucleic Acids Res..

[5]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[6]  J. Friedman,et al.  FLEXIBLE PARSIMONIOUS SMOOTHING AND ADDITIVE MODELING , 1989 .

[7]  Tomoyuki Higuchi,et al.  Automatic identification of large‐scale field‐aligned current structures , 2000 .

[8]  Satoru Miyano,et al.  Estimation of Genetic Networks and Functional Structures Between Genes by Using Bayesian Networks and Nonparametric Regression , 2001, Pacific Symposium on Biocomputing.

[9]  Nakao,et al.  Genome-scale Gene Expression Analysis and Pathway Reconstruction in KEGG. , 1999, Genome informatics. Workshop on Genome Informatics.

[10]  Trevor Hastie,et al.  [Flexible Parsimonious Smoothing and Additive Modeling]: Discussion , 1989 .

[11]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[12]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[13]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[14]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  Y. Hihara,et al.  DNA Microarray Analysis of Cyanobacterial Gene Expression during Acclimation to High Light , 2001, Plant Cell.

[17]  Mark Carpenter,et al.  The New Statistical Analysis of Data , 2000, Technometrics.

[18]  H. Akaike A new look at the statistical model identification , 1974 .