Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data

BackgroundIn practice many biological time series measurements, including gene microarrays, are conducted at time points that seem to be interesting in the biologist's opinion and not necessarily at fixed time intervals. In many circumstances we are interested in finding targets that are expressed periodically. To tackle the problems of uneven sampling and unknown type of noise in periodicity detection, we propose to use robust regression.MethodsThe aim of this paper is to develop a general framework for robust periodicity detection and review and rank different approaches by means of simulations. We also show the results for some real measurement data.ResultsThe simulation results clearly show that when the sampling of time series gets more and more uneven, the methods that assume even sampling become unusable. We find that M-estimation provides a good compromise between robustness and computational efficiency.ConclusionSince uneven sampling occurs often in biological measurements, the robust methods developed in this paper are expected to have many uses. The regression based formulation of the periodicity detection problem easily adapts to non-uniform sampling. Using robust regression helps to reject inconsistently behaving data points.AvailabilityThe implementations are currently available for Matlab and will be made available for the users of R as well. More information can be found in the web-supplement [1].

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[3]  Peter Frick,et al.  Wavelet Analysis of Stellar Chromospheric Activity Variations , 1997 .

[4]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[5]  Werner A. Stahel,et al.  New directions in statistical data analysis and robustness. Proceedings of the Workshop on Data Analysis and Robustness held in Ascona, 1992 , 1994 .

[6]  Korbinian Strimmer,et al.  Identifying periodically expressed transcripts in microarray time series data , 2008, Bioinform..

[7]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  R. Fisher Tests of significance in harmonic analysis , 1929 .

[10]  Marvin H. J. Guber Bayesian Spectrum Analysis and Parameter Estimation , 1988 .

[11]  Richard A. Davis,et al.  Time Series: Theory and Methods (2nd ed.). , 1992 .

[12]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Martin Schader,et al.  Data Analysis: Scientific Modeling And Practical Application , 2000 .

[14]  Jie Chen,et al.  Bioinformatics Original Paper Detecting Periodic Patterns in Unevenly Spaced Gene Expression Time Series Using Lomb–scargle Periodograms , 2022 .

[15]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[16]  Jon Wakefield,et al.  Bayesian Analysis of Cell-Cycle Gene Expression Data , 2005 .

[17]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[18]  Andrzej Tarczynski,et al.  Optimal periodic sampling sequences for nearly-alias-free digital signal processing , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[19]  Anders Berglund,et al.  A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription , 2003, Bioinform..

[20]  G. Moody,et al.  Power spectral density of unevenly sampled data by least-square analysis: performance and application to heart rate signals , 1998, IEEE Transactions on Biomedical Engineering.

[21]  Shyamal D Peddada,et al.  A random-periods model for expression of cell-cycle genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[23]  A. Schwarzenberg-Czerny Fast and Statistically Optimal Period Search in Uneven Sampled Observations , 1996 .

[24]  Zhaohui S. Qin,et al.  Statistical resynchronization and Bayesian detection of periodically expressed genes. , 2004, Nucleic acids research.

[25]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[26]  Roberto Tagliaferri,et al.  Neural networks for periodicity analysis of unevenly spaced data , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[27]  C. Hurvich,et al.  High Breakdown Methods of Time Series Analysis , 1993 .

[28]  Stefan Van Aelst,et al.  Robust Multivariate Regression , 2004, Technometrics.

[29]  Timo I. Laakso,et al.  Spectrum estimation of non-uniformly sampled signals , 1996, Proceedings of IEEE International Symposium on Industrial Electronics.

[30]  Hongzhe Li,et al.  Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data , 2004, Bioinform..

[31]  Mats G. Gustafsson,et al.  Bayesian detection of periodic mRNA time profiles without use of training examples , 2006, BMC Bioinformatics.

[32]  Heimo Ihalainen,et al.  A wavelet based method for the estimation of the power spectrum from irregularly sampled data , 1998 .

[33]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Ronald K. Pearson,et al.  BMC Bioinformatics BioMed Central Methodology article , 2005 .

[35]  Ziv Bar-Joseph,et al.  Active learning for sampling in time-series experiments with application to gene expression analysis , 2005, ICML.

[36]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[37]  Robert R. Klevecz,et al.  Dynamic architecture of the yeast cell cycle uncovered by wavelet decomposition of expression microarray data , 2000, Functional & Integrative Genomics.

[38]  Yuan Qi,et al.  Bayesian spectrum estimation of unevenly sampled nonstationary data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  H. Hartley,et al.  Tests of significance in harmonic analysis. , 1949, Biometrika.

[40]  Andrzej Tarczynski,et al.  Spectrum estimation of nonuniformly sampled signals , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[41]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[42]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[43]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[44]  Jie Chen,et al.  Identification of significant periodic genes in microarray gene expression data , 2005, BMC Bioinformatics.

[45]  P. Djurić,et al.  Bayesian spectrum estimation of harmonic signals , 1995, IEEE Signal Processing Letters.

[46]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[47]  M Schimmel,et al.  Emphasizing Difficulties in the Detection of Rhythms with Lomb-Scargle Periodograms , 2001, Biological rhythm research.

[48]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[49]  Peter J. Rousseeuw,et al.  An Algorithm for Positive-Breakdown Regression Based on Concentration Steps , 2000 .

[50]  Ronald K. Pearson,et al.  Mining imperfect data - dealing with contamination and incomplete records , 2005 .