Periodicity Detection Method for Small-Sample Time Series Datasets

Time series of gene expression often exhibit periodic behavior under the influence of multiple signal pathways, and are represented by a model that incorporates multiple harmonics and noise. Most of these data, which are observed using DNA microarrays, consist of few sampling points in time, but most periodicity detection methods require a relatively large number of sampling points. We have previously developed a detection algorithm based on the discrete Fourier transform and Akaike's information criterion. Here we demonstrate the performance of the algorithm for small-sample time series data through a comparison with conventional and newly proposed periodicity detection methods based on a statistical analysis of the power of harmonics. We show that this method has higher sensitivity for data consisting of multiple harmonics, and is more robust against noise than other methods. Although “combinatorial explosion” occurs for large datasets, the computational time is not a problem for small-sample datasets. The MATLAB/GNU Octave script of the algorithm is available on the author's web site: http://www.cbrc.jp/%7Etominaga/piccolo/.

[1]  G. Reinsel,et al.  Introduction to Mathematical Statistics (4th ed.). , 1980 .

[2]  W. J. Dixon,et al.  Analysis of Extreme Values , 1950 .

[3]  石黒 真木夫,et al.  Akaike information criterion statistics , 1986 .

[4]  David B. Rorabacher,et al.  Statistical treatment for rejection of deviant values: critical values of Dixon's "Q" parameter and related subrange ratios at the 95% confidence level , 1991 .

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Ronald K. Pearson,et al.  BMC Bioinformatics BioMed Central Methodology article , 2005 .

[7]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[8]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[9]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[10]  DAISUKE TOMINAGA,et al.  Judgment Algorithm for Periodicity of Time Series Data Based on Bayesian Information Criterion , 2008, J. Bioinform. Comput. Biol..

[11]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[12]  Juan Toro,et al.  The detection of hidden periodicities: A comparison of alternative methods , 2004 .

[13]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[14]  H. Okamura,et al.  A novel statistical method for validating the periodicity of vertebral growth band formation in elasmobranch fishes , 2009 .

[15]  W. J. Dixon,et al.  Ratios Involving Extreme Values , 1951 .

[16]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[17]  Eduardo Lleida,et al.  Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  John J. Benedetto,et al.  Periodic Wavelet Transforms and Periodicity Detection , 2002, SIAM J. Appl. Math..

[19]  Zhen Su,et al.  Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation , 2010, Bioinform..

[20]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[21]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .