Discovering Statistically Significant Periodic Gene Expression

One frequent application of microarray experiments is in the study of monitoring gene activities in a cell during cell cycle or cell division. High throughput gene expression time series data are produced from such microarray experiments. A new computational and statistical challenge for analyzing such gene expression time course data, resulting from cell cycle microarray experiments, is to discover genes that are statistically significantly periodically expressed during the cell cycle. Such a challenge occurs due to the large number of genes that are simultaneously measured, a moderate to small number of measurements per gene taken at different time points and high levels of non-normal random noises inherited in the data. Computational and statistical approaches to discovery and validation of periodic patterns of gene expression are, however, very limited. A good method of analysis should be able to search for significant periodic genes with a controlled family-wise error (FWE) rate or controlled false discovery rate (FDR) and any other variations of FDR, when all gene expression profiles are compared simultaneously. In this review paper, a brief summary of currently used methods in searching for periodic genes will be given. In particular, two methods will be surveyed in details. The first one is a novel statistical inference approach, the C & G Procedure that can be used to effectively detect statistically significantly periodically expressed genes when the gene expression is measured on evenly spaced time points. The second one is the Lomb-Scargle periodogram analysis, which can be used to discover periodic genes when the gene profiles are not measured on evenly spaced time points or when there are missing values in the profiles. The ultimate goal of this review paper is to give an expository of the two surveyed methods to researchers in related fields. Copyright (c) 2008 The Authors. Journal compilation (c) 2008 International Statistical Institute.

[1]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[2]  M. Eisen,et al.  Why PLoS Became a Publisher , 2003, PLoS biology.

[3]  Anthony K. Yan,et al.  Phase-Independent Rhythmic Analysis of Genome-Wide Expression Patterns , 2003, J. Comput. Biol..

[4]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[5]  Korbinian Strimmer,et al.  Identifying periodically expressed transcripts in microarray time series data , 2008, Bioinform..

[6]  K. Shedden,et al.  Analysis of cell-cycle-specific gene expression in human cells as determined by microarrays and double-thymidine block synchronization , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Kerby Shedden,et al.  Analysis of cell-cycle gene expression in Saccharomyces cerevisiae using microarrays and multiple synchronization methods , 2002, Nucleic Acids Res..

[8]  H. McAdams,et al.  Global analysis of the genetic network controlling a bacterial cell cycle. , 2000, Science.

[9]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[10]  Jie Chen,et al.  Identification of significant periodic genes in microarray gene expression data , 2005, BMC Bioinformatics.

[11]  Harold T. Davis,et al.  The Analysis of Economic Time Series. , 1942 .

[12]  Ziv Bar-Joseph,et al.  Deconvolving cell cycle expression data with complementary information , 2004, ISMB 2004.

[13]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[14]  S. Crosthwaite,et al.  Circadian clocks and natural antisense RNA , 2004, FEBS Letters.

[15]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[16]  W. Press,et al.  Fast algorithm for spectral analysis of unevenly sampled data , 1989 .

[17]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[18]  Antonino Staiano,et al.  A multi-step approach to time series analysis and gene expression clustering , 2006, Bioinform..

[19]  Jie Chen,et al.  A Complex Oscillating Network of Signaling Genes Underlies the Mouse Segmentation Clock , 2006, Science.

[20]  Mary-Lee Dequéant,et al.  Periodic Notch inhibition by Lunatic Fringe underlies the chick segmentation clock , 2003, Nature.

[21]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[22]  Hans-Georg Müller,et al.  Classification using functional data analysis for temporal gene expression data , 2006, Bioinform..

[23]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[24]  S. Baliunas,et al.  A Prescription for period analysis of unevenly sampled time series , 1986 .

[25]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[26]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[27]  James Durbin,et al.  Tests for serial correlation in regression analysis based on the periodogram of least-squares residuals , 1969 .

[28]  J. Mitchison,et al.  Growth during the cell cycle. , 2003, International review of cytology.

[29]  Jie Chen,et al.  Bioinformatics Original Paper Detecting Periodic Patterns in Unevenly Spaced Gene Expression Time Series Using Lomb–scargle Periodograms , 2022 .

[30]  Anders Berglund,et al.  A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription , 2003, Bioinform..

[31]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[33]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[34]  T. Speed,et al.  Statistical Analysis of Microarray Time Course Data , 2005 .

[35]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[36]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[37]  Ronald W. Davis,et al.  Transcriptional regulation and function during the human cell cycle , 2001, Nature Genetics.

[38]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[39]  R. Fisher Tests of significance in harmonic analysis , 1929 .

[40]  M. S. Bartlett,et al.  An introduction to stochastic processes, with special reference to methods and applications , 1955 .

[41]  N. Lomb Least-squares frequency analysis of unequally spaced data , 1976 .

[42]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[43]  W. Fuller,et al.  Introduction to Statistical Time Series (2nd ed.) , 1997 .

[44]  M. B. Brown,et al.  A flexible model for human circadian rhythms. , 1996, Biometrics.

[45]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[46]  E. A. Sylvestre,et al.  Self Modeling Nonlinear Regression , 1972 .

[47]  K. J. Ray Liu,et al.  Polynomial model approach for resynchronization analysis of cell-cycle gene expression data , 2006, Bioinform..

[48]  A. Goldbeter Computational approaches to cellular rhythms , 2002, Nature.

[49]  Hongzhe Li,et al.  Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data , 2004, Bioinform..

[50]  James Durbin,et al.  Tests of serial independence based on the cumulated periodogram , 1967 .

[51]  H. Ogata,et al.  Transcriptional response of Rickettsia conorii exposed to temperature variation and stress starvation. , 2005, Research in microbiology.