tsGT: Stochastic Time Series Modeling With Transformer

Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values.

[1]  Tara N. Sainath,et al.  Gemini: A Family of Highly Capable Multimodal Models , 2023, ArXiv.

[2]  Christopher D. Manning,et al.  Holistic Evaluation of Language Models , 2023, Trans. Mach. Learn. Res..

[3]  Haoming Jiang,et al.  Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond , 2023, ACM Trans. Knowl. Discov. Data.

[4]  Wayne Xin Zhao,et al.  A Survey of Large Language Models , 2023, ArXiv.

[5]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[6]  Sercan Ö. Arik,et al.  TSMixer: An all-MLP Architecture for Time Series Forecasting , 2023, ArXiv.

[7]  Christian Szegedy,et al.  Magnushammer: A Transformer-based Approach to Premise Selection , 2023, ArXiv.

[8]  Lawrence Chan,et al.  A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations , 2023, ICML.

[9]  J. Steinhardt,et al.  Progress measures for grokking via mechanistic interpretability , 2023, ICLR.

[10]  J. Kalagnanam,et al.  A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , 2022, ICLR.

[11]  J. Steinhardt,et al.  Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small , 2022, ArXiv.

[12]  Tom B. Brown,et al.  Language Models (Mostly) Know What They Know , 2022, ArXiv.

[13]  Yuhuai Wu,et al.  Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.

[14]  Lerrel Pinto,et al.  Behavior Transformers: Cloning k modes with one stone , 2022, NeurIPS.

[15]  Yuhuai Wu,et al.  Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search , 2022, ICLR.

[16]  L. Zhang,et al.  Are Transformers Effective for Time Series Forecasting? , 2022, AAAI.

[17]  Nando de Freitas,et al.  Towards Learning Universal Hyperparameter Optimizers with Transformers , 2022, NeurIPS.

[18]  Sergio Gomez Colmenarejo,et al.  A Generalist Agent , 2022, Trans. Mach. Learn. Res..

[19]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[20]  Lisa Anne Hendricks,et al.  Training Compute-Optimal Large Language Models , 2022, ArXiv.

[21]  Junchi Yan,et al.  Transformers in Time Series: A Survey , 2022, IJCAI.

[22]  Jesse Michael Han,et al.  Formal Mathematics Statement Curriculum Learning , 2022, ICLR.

[23]  Tian Zhou,et al.  FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting , 2022, ICML.

[24]  Marc G. Bellemare,et al.  Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.

[25]  Konrad Czechowski,et al.  Subgoal Search For Complex Reasoning Tasks , 2021, NeurIPS.

[26]  Kashif Rasul,et al.  Probabilistic Time Series Forecasting with Implicit Quantile Networks , 2021, ArXiv.

[27]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[28]  Jianmin Wang,et al.  Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , 2021, NeurIPS.

[29]  Sergey Levine,et al.  Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.

[30]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[31]  Jianlin Su,et al.  RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.

[32]  Rodrigo Nogueira,et al.  Investigating the Limitations of Transformers with Simple Arithmetic Tasks , 2021, 2102.13019.

[33]  Florian Ziel,et al.  CRPS Learning , 2021, Journal of Econometrics.

[34]  Frank Rudzicz,et al.  BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data , 2021, Frontiers in Human Neuroscience.

[35]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ACM Comput. Surv..

[36]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[37]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[38]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.

[39]  Afroz Mohiuddin,et al.  Forecasting Deep Learning Dynamics with Applications to Hyperparameter Tuning , 2019 .

[40]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[41]  Nicolas Chapados,et al.  N-BEATS: Neural basis expansion analysis for interpretable time series forecasting , 2019, ICLR.

[42]  Philippe Naveau,et al.  Estimation of the Continuous Ranked Probability Score with Limited Information and Applications to Ensemble Weather Forecasts , 2018, Mathematical Geosciences.

[43]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[46]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.

[47]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[48]  P. Brockwell,et al.  Time Series: Theory and Methods , 2013 .

[49]  Thomas L. Burr,et al.  Modeling Financial Time Series With S—Plus , 2007, Technometrics.

[50]  Michel Verleysen,et al.  Vector quantization: a weighted version for time-series forecasting , 2005, Future Gener. Comput. Syst..

[51]  Tae-Hwy Lee,et al.  Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood , 2004 .

[52]  R. E. Donatelli,et al.  Time Series Analysis , 2003, Statistics for Environmental Science and Management.

[53]  Eric R. Ziegel,et al.  Analysis of Financial Time Series , 2002, Technometrics.

[54]  A. McNeil,et al.  Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach , 2000 .

[55]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[56]  Paul H. Kupiec,et al.  Techniques for Verifying the Accuracy of Risk Measurement Models , 1995 .

[57]  T. Hill The Significant-Digit Phenomenon , 1995 .

[58]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[59]  Peter Whittle,et al.  Hypothesis Testing in Time Series Analysis. , 1951 .

[60]  Liang Sun,et al.  Power Time Series Forecasting by Pretrained LM , 2023, ArXiv.

[61]  J. Choo,et al.  Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , 2022, ICLR.

[62]  Yuting Lin,et al.  Llama , 2021, Encyclopedic Dictionary of Archaeology.

[63]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[64]  R. Gray Vector Quantization , 2017, Encyclopedia of GIS.

[65]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[66]  Alexander J. McNeil,et al.  Quantitative Risk Management: Concepts, Techniques and Tools Revised edition , 2015 .

[67]  N. Whitman A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[68]  Shayan Jawed FQFormer: A Fully Quantile Transformer for Time Series Forecasting , 2022 .