论文信息 - tsGT: Stochastic Time Series Modeling With Transformer - 字舞流文

tsGT: Stochastic Time Series Modeling With Transformer

Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values.

Piotr Kozakowski | Lukasz Kuci'nski | Piotr Milo's | Witold Drzewakowski | Mateusz Olko | Lukasz Maziarka | Marta Emilia Nowakowska | Lukasz Kaiser

[1] Tara N. Sainath,et al. Gemini: A Family of Highly Capable Multimodal Models , 2023, ArXiv.

[2] Christopher D. Manning,et al. Holistic Evaluation of Language Models , 2023, Trans. Mach. Learn. Res..

[3] Haoming Jiang,et al. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond , 2023, ACM Trans. Knowl. Discov. Data.

[4] Wayne Xin Zhao,et al. A Survey of Large Language Models , 2023, ArXiv.

[5] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.

[6] Sercan Ö. Arik,et al. TSMixer: An all-MLP Architecture for Time Series Forecasting , 2023, ArXiv.

[7] Christian Szegedy,et al. Magnushammer: A Transformer-based Approach to Premise Selection , 2023, ArXiv.

[8] Lawrence Chan,et al. A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations , 2023, ICML.

[9] J. Steinhardt,et al. Progress measures for grokking via mechanistic interpretability , 2023, ICLR.

[10] J. Kalagnanam,et al. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , 2022, ICLR.

[11] J. Steinhardt,et al. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small , 2022, ArXiv.

[12] Tom B. Brown,et al. Language Models (Mostly) Know What They Know , 2022, ArXiv.

[13] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.

[14] Lerrel Pinto,et al. Behavior Transformers: Cloning k modes with one stone , 2022, NeurIPS.

[15] Yuhuai Wu,et al. Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search , 2022, ICLR.

[16] L. Zhang,et al. Are Transformers Effective for Time Series Forecasting? , 2022, AAAI.

[17] Nando de Freitas,et al. Towards Learning Universal Hyperparameter Optimizers with Transformers , 2022, NeurIPS.

[18] Sergio Gomez Colmenarejo,et al. A Generalist Agent , 2022, Trans. Mach. Learn. Res..

[19] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[20] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.

[21] Junchi Yan,et al. Transformers in Time Series: A Survey , 2022, IJCAI.

[22] Jesse Michael Han,et al. Formal Mathematics Statement Curriculum Learning , 2022, ICLR.

[23] Tian Zhou,et al. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting , 2022, ICML.

[24] Marc G. Bellemare,et al. Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.

[25] Konrad Czechowski,et al. Subgoal Search For Complex Reasoning Tasks , 2021, NeurIPS.

[26] Kashif Rasul,et al. Probabilistic Time Series Forecasting with Implicit Quantile Networks , 2021, ArXiv.

[27] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[28] Jianmin Wang,et al. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , 2021, NeurIPS.

[29] Sergey Levine,et al. Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.

[30] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[31] Jianlin Su,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.

[32] Rodrigo Nogueira,et al. Investigating the Limitations of Transformers with Simple Arithmetic Tasks , 2021, 2102.13019.

[33] Florian Ziel,et al. CRPS Learning , 2021, Journal of Econometrics.

[34] Frank Rudzicz,et al. BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data , 2021, Frontiers in Human Neuroscience.

[35] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..

[36] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[37] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.

[38] Nicolas Loeff,et al. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.

[39] Afroz Mohiuddin,et al. Forecasting Deep Learning Dynamics with Applications to Hyperparameter Tuning , 2019 .

[40] Wenhu Chen,et al. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[41] Nicolas Chapados,et al. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting , 2019, ICLR.

[42] Philippe Naveau,et al. Estimation of the Continuous Ranked Probability Score with Limited Information and Applications to Ensemble Weather Forecasts , 2018, Mathematical Geosciences.

[43] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.

[44] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[45] Valentin Flunkert,et al. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[46] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.

[47] George Athanasopoulos,et al. Forecasting: principles and practice , 2013 .

[48] P. Brockwell,et al. Time Series: Theory and Methods , 2013 .

[49] Thomas L. Burr,et al. Modeling Financial Time Series With S—Plus , 2007, Technometrics.

[50] Michel Verleysen,et al. Vector quantization: a weighted version for time-series forecasting , 2005, Future Gener. Comput. Syst..

[51] Tae-Hwy Lee,et al. Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood , 2004 .

[52] R. E. Donatelli,et al. Time Series Analysis , 2003, Statistics for Environmental Science and Management.

[53] Eric R. Ziegel,et al. Analysis of Financial Time Series , 2002, Technometrics.

[54] A. McNeil,et al. Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach , 2000 .

[55] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.

[56] Paul H. Kupiec,et al. Techniques for Verifying the Accuracy of Risk Measurement Models , 1995 .

[57] T. Hill. The Significant-Digit Phenomenon , 1995 .

[58] R. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[59] Peter Whittle,et al. Hypothesis Testing in Time Series Analysis. , 1951 .

[60] Liang Sun,et al. Power Time Series Forecasting by Pretrained LM , 2023, ArXiv.

[61] J. Choo,et al. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift , 2022, ICLR.

[62] Yuting Lin,et al. Llama , 2021, Encyclopedic Dictionary of Archaeology.

[63] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[64] R. Gray. Vector Quantization , 2017, Encyclopedia of GIS.

[65] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[66] Alexander J. McNeil,et al. Quantitative Risk Management: Concepts, Techniques and Tools Revised edition , 2015 .

[67] N. Whitman. A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[68] Shayan Jawed. FQFormer: A Fully Quantile Transformer for Time Series Forecasting , 2022 .