GRATIS: GeneRAting TIme Series with diverse and controllable characteristics

The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires a diverse collection of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We generate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application.

[1]  Robert Kohn,et al.  Flexible Modeling of Conditional Distributions Using Smooth Mixtures of Asymmetric Student T Densities , 2009 .

[2]  Evangelos Spiliotis,et al.  The M4 Competition: Results, findings, conclusion and way forward , 2018, International Journal of Forecasting.

[3]  Rob J. Hyndman,et al.  Meta‐learning how to forecast time series , 2023, Journal of Forecasting.

[4]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[5]  Richard J. Povinelli,et al.  Time series classification using Gaussian mixture models of reconstructed phase spaces , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[7]  A. Raftery,et al.  Space-time modeling with long-memory dependence: assessing Ireland's wind-power resource. Technical report , 1987 .

[8]  Kate Smith-Miles,et al.  Generating new test instances by evolving in instance space , 2015, Comput. Oper. Res..

[9]  Y. Le Strat,et al.  Evaluation and comparison of statistical methods for early temporal detection of outbreaks: A simulation-based study , 2017, PloS one.

[10]  K. Smith‐Miles,et al.  Classes of structures in the stable atmospheric boundary layer , 2015 .

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  W. Li,et al.  On a mixture autoregressive model , 2000 .

[13]  Ben D. Fulcher,et al.  Feature-based time-series analysis , 2017, ArXiv.

[14]  Hrishikesh D. Vinod,et al.  Maximum Entropy Bootstrap for Time Series: The meboot R Package , 2009 .

[15]  Nigel Meade,et al.  Evidence for the selection of forecasting methods , 2000 .

[16]  G. Box,et al.  On a measure of lack of fit in time series models , 1978 .

[17]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[18]  Rob J. Hyndman,et al.  Large-Scale Unusual Time Series Detection , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[19]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[20]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[21]  Michael Y. Hu,et al.  A simulation study of artificial neural networks for nonlinear time-series forecasting , 2001, Comput. Oper. Res..

[22]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[23]  Kate Smith-Miles,et al.  Detecting and Classifying Events in Noisy Time Series , 2014 .

[24]  Kate Smith-Miles,et al.  Performance Analysis of Continuous Black-Box Optimization Algorithms via Footprints in Instance Space , 2016, Evolutionary Computation.

[25]  B. McCabe,et al.  Analysis of low count time series data by poisson autoregression , 2004 .

[26]  Kate Smith-Miles,et al.  Visualising forecasting algorithm performance using time series instance spaces , 2017 .

[27]  Fred Collopy,et al.  Rule-Based Forecasting: Development and Validation of an Expert Systems Approach to Combining Time Series Extrapolations , 1992 .

[28]  R. Kohn,et al.  Regression Density Estimation Using Smooth Adaptive Gaussian Mixtures , 2007 .

[29]  E. Adam INDIVIDUAL ITEM FORECASTING MODEL EVALUATION , 1973 .

[30]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[31]  Chandra Shah,et al.  Model selection in univariate time series forecasting using discriminant analysis , 1997 .

[32]  R. Koenker,et al.  Regression Quantiles , 2007 .

[33]  Kate Smith-Miles,et al.  Instance spaces for machine learning classification , 2017, Machine Learning.

[34]  J. Friedman Multivariate adaptive regression splines , 1990 .

[35]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[36]  Nick S. Jones,et al.  Highly Comparative Feature-Based Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[37]  Inbal Yahav,et al.  Simulating and Evaluating Biosurveillance Datasets , 2008 .

[38]  Bo Peng,et al.  An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex Penalized Quantile Regression , 2015 .

[39]  Xiaozhe Wang,et al.  Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series , 2009, Neurocomputing.

[40]  J. Friedman A VARIABLE SPAN SMOOTHER , 1984 .

[41]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[42]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[43]  Fotios Petropoulos,et al.  'Horses for Courses' in demand forecasting , 2014, Eur. J. Oper. Res..

[44]  Wenxin Jiang,et al.  On the Approximation Rate of Hierarchical Mixtures-of-Experts for Generalized Linear Models , 1999, Neural Computation.

[45]  Andriy Norets,et al.  Approximation of conditional densities by smooth mixtures of regressions , 2010, 1010.0581.

[46]  Robert L. Parker,et al.  psd: Adaptive, sine multitaper power spectral density estimation for R , 2014, Comput. Geosci..

[47]  P. Phillips,et al.  Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? , 1992 .

[48]  Chris Birchenhall,et al.  Seasonality and the Order of Integration for Consumption , 2009 .

[49]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[50]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[51]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[52]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[53]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Alain Latour,et al.  Existence and Stochastic Structure of a Non‐negative Integer‐valued Autoregressive Process , 1998 .

[55]  Evangelos Spiliotis,et al.  Are forecasting competitions data representative of the reality? , 2020 .

[56]  F. Hall,et al.  Approximation of conditional densities by smooth mixtures of regressions ∗ , 2009 .

[57]  Wolfgang Jank,et al.  Explaining and Forecasting Online Auction Prices and Their Dynamics Using Functional Data Analysis , 2008 .

[58]  Zhijie Xiao Testing the Null Hypothesis of Stationarity Against an Autoregressive Unit Root Alternative , 2001 .

[59]  Wolfgang Lehner,et al.  Generating What-If Scenarios for Time Series Data , 2017, SSDBM.

[60]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[61]  Galit Shmueli,et al.  Tree-based methods for clustering time series using domain-relevant attributes , 2019, Journal of Business Analytics.

[62]  Galit Shmueli,et al.  How does improved forecasting benefit detection? An application to biosurveillance , 2009 .

[63]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[64]  A. Raftery,et al.  Modeling flat stretches, bursts, and outliers in time series using mixture transition distribution models , 1996 .