Clustering of discretely observed diffusion processes

A new distance to classify time series is proposed. The underlying generating process is assumed to be a diffusion process solution to stochastic differential equations and observed at discrete times. The mesh of observations is not required to shrink to zero. The new dissimilarity measure is based on the L^1 distance between the Markov operators estimated on two observed paths. Simulation experiments are used to analyze the performance of the proposed distance under several conditions including perturbation and misspecification. As an example, real financial data from NYSE/NASDAQ stocks are analyzed and evidence is provided that the new distance seems capable to catch differences in both the drift and diffusion coefficients better than other commonly used non-parametric distances. Corresponding software is available in the add-on package sde for the R statistical environment.

[1]  S. Levin Lectu re Notes in Biomathematics , 1983 .

[2]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[3]  Yacine Aït-Sahalia Nonparametric Pricing of Interest Rate Derivative Securities , 1996 .

[4]  R. C. Merton,et al.  Theory of Rational Option Pricing , 2015, World Scientific Reference on Contingent Claims Analysis in Corporate Finance.

[5]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[6]  Michael Sørensen,et al.  Estimating equations based on eigenfunctions for a discretely observed diffusion process , 1999 .

[7]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .

[8]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[9]  Yacine Aït-Sahalia Nonparametric Pricing of Interest Rate Derivative Securities , 1995 .

[10]  Elizabeth Ann Maharaj,et al.  Comparison and classification of stationary multivariate time series , 1999, Pattern Recognit..

[11]  P. Kloeden,et al.  Numerical Solution of Sde Through Computer Experiments , 1993 .

[12]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[13]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[14]  Marcella Corduas,et al.  Time series clustering and classification by the autoregressive metric , 2008, Comput. Stat. Data Anal..

[15]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[16]  Jordan Stoyanov,et al.  Simulation and Inference for Stochastic Differential Equations: with R Examples , 2011 .

[17]  C. Holland On a formula in diffusion processes in population genetics , 1976 .

[18]  R. Ratcliff,et al.  A comparison of four methods for simulating the diffusion process , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[19]  L. Hansen,et al.  Spectral methods for identifying scalar diffusions , 1998 .

[20]  J. R. Berrendero,et al.  Time series clustering based on forecast densities , 2006, Comput. Stat. Data Anal..

[21]  Loren Cobb,et al.  Stochastic Differential Equations for the Social Sciences , 1998 .

[22]  H. Kushner Stochastic Stability and Control , 2012 .

[23]  Susanne Ditlevsen,et al.  The fast climate fluctuations during the stadial and interstadial climate states , 2002, Annals of Glaciology.

[24]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[25]  T. Gasser,et al.  Alignment of curves by dynamic time warping , 1997 .

[26]  Junichi Hirukawa,et al.  CLUSTER ANALYSIS FOR NON-GAUSSIAN LOCALLY STATIONARY PROCESSES , 2006 .

[27]  M. Hoffmann,et al.  Nonparametric estimation of scalar diffusions based on low frequency data , 2002, math/0503680.

[28]  Toni Giorgino,et al.  Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .

[29]  Stefano M. Iacus,et al.  Simulation and Inference for Stochastic Differential Equations: With R Examples , 2008 .

[30]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[31]  Elizabeth E. Holmes,et al.  BEYOND THEORY TO APPLICATION AND EVALUATION: DIFFUSION APPROXIMATIONS FOR POPULATION VIABILITY ANALYSIS , 2004 .

[32]  Edoardo Otranto Clustering heteroskedastic time series by model-based procedures , 2008, Comput. Stat. Data Anal..

[33]  K. Arzner,et al.  Cosmic mass functions from Gaussian stochastic diffusion processes , 2001, astro-ph/0102439.

[34]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[35]  Dit-Yan Yeung,et al.  Mixtures of ARMA models for model-based time series clustering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..