Detecting periodicities with Gaussian processes

We consider the problem of detecting and quantifying the periodic component of a function given noise-corrupted observations of a limited number of input/output tuples. Our approach is based on Gaussian process regression, which provides a flexible non-parametric framework for modelling periodic data. We introduce a novel decomposition of the covariance function as the sum of periodic and aperiodic kernels. This decomposition allows for the creation of sub-models which capture the periodic nature of the signal and its complement. To quantify the periodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the fitted sub-models. Although the method can be applied to many kernels, we give a special emphasis to the Matern family, from the expression of the reproducing kernel Hilbert space inner product to the implementation of the associated periodic kernels in a Gaussian process toolkit. The proposed method is illustrated by considering the detection of periodically expressed genes in the arabidopsis genome.

[1]  Anthony Hall,et al.  FLOWERING LOCUS C Mediates Natural Variation in the High-Temperature Response of the Arabidopsis Circadian Clock[W] , 2006, The Plant Cell Online.

[2]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[3]  A. O'Hagan,et al.  Probabilistic sensitivity analysis of complex models: a Bayesian approach , 2004 .

[4]  R. Stellingwerf Period determination using phase dispersion minimization , 1978 .

[5]  J. Doob Stochastic processes , 1953 .

[6]  G. Matheron Principles of geostatistics , 1963 .

[7]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[8]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[9]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[10]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[11]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[12]  Thomas Kailath,et al.  RKHS approach to detection and estimation problems-I: Deterministic signals in Gaussian noise , 1971, IEEE Trans. Inf. Theory.

[13]  Martin Straume,et al.  DNA Microarray Time Series Analysis: Automated Statistical Assessment of Circadian Rhythms in Gene Expression Patterning , 2004, Numerical Computer Methods, Part D.

[14]  B. Troutman Some results in periodic autoregression , 1979 .

[15]  A. V. Vecchia MAXIMUM LIKELIHOOD ESTIMATION FOR PERIODIC AUTOREGRESSIVE MOVING AVERAGE MODELS. , 1985 .

[16]  Arthur Schuster,et al.  On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena , 1898 .

[17]  Emilio Porcu,et al.  On Some Local, Global and Regularity Behaviour of Some Classes of Covariance Functions , 2012 .

[18]  Satchidananda Panda,et al.  Harmonics of Circadian Gene Transcription in Mammals , 2009, PLoS genetics.

[19]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[20]  Martin C. Weisskopf,et al.  On searches for pulsed emission with application to four globular cluster X-ray sources - NGC 1851, 6441, 6624, and 6712 , 1983 .

[21]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[22]  I. Johnston,et al.  Circadian expression of clock and putative clock-controlled genes in skeletal muscle of the zebrafish. , 2012, American journal of physiology. Regulatory, integrative and comparative physiology.

[23]  Robert Schaback,et al.  Interpolation of spatial data – A stochastic or a deterministic problem? , 2013, European Journal of Applied Mathematics.

[24]  H. Hartley,et al.  Tests of significance in harmonic analysis. , 1949, Biometrika.

[25]  David Ginsbourger,et al.  Additive Kernels for Gaussian Process Modeling , 2011, 1103.4023.

[26]  J. Hájek On linear statistical problems in stochastic processes , 1962 .

[27]  Olivier Roustant,et al.  Calculations of Sobol indices for the Gaussian process metamodel , 2008, Reliab. Eng. Syst. Saf..

[28]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[29]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[30]  Seth J Davis,et al.  A Complex Genetic Interaction Between Arabidopsis thaliana TOC1 and CCA1/LHY in Driving the Circadian Clock and in Output Regulation , 2007, Genetics.

[31]  E. Parzen An Approach to Time Series Analysis , 1961 .

[32]  J. LeConte An Harmonic Analyzer , 1898 .