Identifying periodically expressed transcripts in microarray time series data

MOTIVATION Microarray experiments are now routinely used to collect large-scale time series data, for example to monitor gene expression during the cell cycle. Statistical analysis of this data poses many challenges, one being that it is hard to identify correctly the subset of genes with a clear periodic signature. This has lead to a controversial argument with regard to the suitability of both available methods and current microarray data. METHODS We introduce two simple but efficient statistical methods for signal detection and gene selection in gene expression time series data. First, we suggest the average periodogram as an exploratory device for graphical assessment of the presence of periodic transcripts in the data. Second, we describe an exact statistical test to identify periodically expressed genes that allows one to distinguish periodic from purely random processes. This identification method is based on the so-called g-statistic and uses the false discovery rate approach to multiple testing. RESULTS Using simulated data it is shown that the suggested method is capable of identifying cell-cycle-activated genes in a gene expression data set even if the number of the cyclic genes is very small and regardless the presence of a dominant non-periodic component in the data. Subsequently, we re-examine 12 large microarray time series data sets (in part controversially discussed) from yeast, human fibroblast, human HeLa and bacterial cells. Based on the statistical analysis it is found that a majority of these data sets contained little or no statistical significant evidence for genes with periodic variation linked to cell cycle regulation. On the other hand, for the remaining data the method extends the catalog of previously known cell-cycle-specific transcripts by identifying additional periodic genes not found by other methods. The problem of distinguishing periodicity due to generic cell cycle activity and to artifacts from synchronization is also discussed. AVAILABILITY The approach has been implemented in the R package GeneTS available from http://www.stat.uni-muenchen.de/~strimmer/software.html under the terms of the GNU General Public License.

[1]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[2]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[3]  Ronald W. Davis,et al.  Transcriptional regulation and function during the human cell cycle , 2001, Nature Genetics.

[4]  Stephen Cooper,et al.  Reappraisal of serum starvation, the restriction point, G0, and G1 phase arrest points , 2003, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[5]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[6]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[7]  Kerby Shedden,et al.  Analysis of cell-cycle gene expression in Saccharomyces cerevisiae using microarrays and multiple synchronization methods , 2002, Nucleic Acids Res..

[8]  H. McAdams,et al.  Global analysis of the genetic network controlling a bacterial cell cycle. , 2000, Science.

[9]  J. Hoheisel,et al.  Correspondence analysis applied to microarray data , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  H. Hartley,et al.  Tests of significance in harmonic analysis. , 1949, Biometrika.

[11]  K. Shedden,et al.  Analysis of cell-cycle-specific gene expression in human cells as determined by microarrays and double-thymidine block synchronization , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J. Kim,et al.  Geometry of gene expression dynamics , 2002, Bioinform..

[13]  Korbinian Strimmer,et al.  Modeling gene expression measurement error: a quasi-likelihood approach , 2003, BMC Bioinformatics.

[14]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .