BMC Bioinformatics BioMed Central Methodology article

BackgroundPeriodic phenomena are widespread in biology. The problem of finding periodicity in biological time series can be viewed as a multiple hypothesis testing of the spectral content of a given time series. The exact noise characteristics are unknown in many bioinformatics applications. Furthermore, the observed time series can exhibit other non-idealities, such as outliers, short length and distortion from the original wave form. Hence, the computational methods should preferably be robust against such anomalies in the data.ResultsWe propose a general-purpose robust testing procedure for finding periodic sequences in multiple time series data. The proposed method is based on a robust spectral estimator which is incorporated into the hypothesis testing framework using a so-called g-statistic together with correction for multiple testing. This results in a robust testing procedure which is insensitive to heavy contamination of outliers, missing-values, short time series, nonlinear distortions, and is completely insensitive to any monotone nonlinear distortions. The performance of the methods is evaluated by performing extensive simulations. In addition, we compare the proposed method with another recent statistical signal detection estimator that uses Fisher's test, based on the Gaussian noise assumption. The results demonstrate that the proposed robust method provides remarkably better robustness properties. Moreover, the performance of the proposed method is preferable also in the standard Gaussian case. We validate the performance of the proposed method on real data on which the method performs very favorably.ConclusionAs the time series measured from biological systems are usually short and prone to contain different kinds of non-idealities, we are very optimistic about the multitude of possible applications for our proposed robust statistical periodicity detection method.AvailabilityThe presented methods have been implemented in Matlab and in R. Codes are available on request. Supplementary material is available at: http://www.cs.tut.fi/sgn/csb/robustperiodic/.

[1]  L. Breeden,et al.  Periodic Transcription: A Cycle within a Cycle , 2003, Current Biology.

[2]  Petre Stoica,et al.  Introduction to spectral analysis , 1997 .

[3]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[4]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[5]  C. Sherr Cancer Cell Cycles , 1996, Science.

[6]  Alejandro Correa,et al.  Multiple oscillators regulate circadian gene expression in Neurospora , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Shean-Tsong Chiu,et al.  Detecting Periodic Components in a White Gaussian Time Series , 1989 .

[8]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[9]  Heikki Huttunen,et al.  Estimation and inversion of the effects of cell population asynchrony in gene expression time-series , 2003, Signal Process..

[10]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[11]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[12]  Ziv Bar-Joseph,et al.  Deconvolving cell cycle expression data with complementary information , 2004, ISMB/ECCB.

[13]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[14]  Zhaohui S. Qin,et al.  Statistical resynchronization and Bayesian detection of periodically expressed genes. , 2004, Nucleic acids research.

[15]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[16]  Lansun Ohen,et al.  A BIOCHEMICAL OSCILLATION , 1985 .

[17]  Douglas A. Wolfe,et al.  Introduction to the Theory of Nonparametric Statistics. , 1980 .

[18]  Anders Berglund,et al.  A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription , 2003, Bioinform..

[19]  Shyamal D Peddada,et al.  A random-periods model for expression of cell-cycle genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  D. Allison,et al.  Towards sound epistemological foundations of statistical methods for high-dimensional biology , 2004, Nature Genetics.

[21]  Korbinian Strimmer,et al.  Identifying periodically expressed transcripts in microarray time series data , 2008, Bioinform..

[22]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[23]  Juan Toro,et al.  The detection of hidden periodicities: A comparison of alternative methods , 2004 .

[24]  R. Randles,et al.  Introduction to the Theory of Nonparametric Statistics , 1991 .

[25]  Hongzhe Li,et al.  Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data , 2004, Bioinform..

[26]  Heikki Huttunen,et al.  Detecting Periodicity in Nonideal Datasets , 2003, SDM.

[27]  Richard A. Davis,et al.  Time Series: Theory and Methods (2nd ed.). , 1992 .

[28]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Ziv Bar-Joseph,et al.  Deconvolving cell cycle expression data with complementary information , 2004, ISMB 2004.

[30]  John J. Tyson,et al.  Biochemical Oscillations , 2004 .

[31]  P. Lio’,et al.  Periodic gene expression program of the fission yeast cell cycle , 2004, Nature Genetics.