Transformers in Time Series: A Survey

Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance.

[1]  Yiwei Wang,et al.  AirFormer: Predicting Nationwide Air Quality in China with Transformers , 2022, AAAI.

[2]  J. Kalagnanam,et al.  A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , 2022, ICLR.

[3]  Tian Zhou,et al.  Learning to Rotate: Quaternion Transformer for Complicated Periodical Time Series Forecasting , 2022, KDD.

[4]  Tian Zhou,et al.  Robust Time Series Analysis and Applications: An Industrial Perspective , 2022, KDD.

[5]  Ranak Roy Chowdhury,et al.  TARNet: Task-Aware Reconstruction for Time-Series Transformer , 2022, KDD.

[6]  Xingjian Shi,et al.  Earthformer: Exploring Space-Time Transformers for Earth System Forecasting , 2022, NeurIPS.

[7]  Tristan Sylvain,et al.  Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting , 2022, ICLR.

[8]  Jianmin Wang,et al.  Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting , 2022, NeurIPS.

[9]  L. Zhang,et al.  Are Transformers Effective for Time Series Forecasting? , 2022, AAAI.

[10]  B. Yang,et al.  Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting-Full Version , 2022, IJCAI.

[11]  Tian Zhou,et al.  FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting , 2022, ICML.

[12]  N. Jennings,et al.  TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data , 2022, Proc. VLDB Endow..

[13]  Anders S. Johansen,et al.  Video Transformers: A Survey , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  D. Pi,et al.  Variational Transformer-based anomaly detection approach for multivariate time series , 2022, Measurement.

[15]  Jason Eisner,et al.  Transformer Embeddings of Irregularly Spaced Events and Their Participants , 2021, ICLR.

[16]  Irena Koprinska,et al.  SSDNet: State Space Decomposition Neural Network for Time Series Forecasting , 2021, 2021 IEEE International Conference on Data Mining (ICDM).

[17]  Jianmin Wang,et al.  Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy , 2021, ICLR.

[18]  Wenwu Ou,et al.  From Known to Unknown: Knowledge-guided Transformer for Time-Series Sales Forecasting in Alibaba , 2021, ArXiv.

[19]  Minghao Chen,et al.  AutoFormer: Searching Transformers for Visual Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Jianmin Wang,et al.  Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , 2021, NeurIPS.

[21]  Chao-Han Huck Yang,et al.  Voice2Series: Reprogramming Acoustic Models for Time Series Classification , 2021, ICML.

[22]  Zhiyuan Liu,et al.  Pre-Trained Models: Past, Present and Future , 2021, AI Open.

[23]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[24]  Yuanqing Xia,et al.  Unsupervised Anomaly Detection in Multivariate Time Series through Transformer-based Variational Autoencoder , 2021, 2021 33rd Chinese Control and Decision Conference (CCDC).

[25]  Xiuzhen Cheng,et al.  Learning Graph Structures With Transformer for Multivariate Time-Series Anomaly Detection in IoT , 2021, IEEE Internet of Things Journal.

[26]  Stephan Günnemann,et al.  Neural Temporal Point Processes: A Review , 2021, IJCAI.

[27]  Siyuan Ma,et al.  Gated Transformer Networks for Multivariate Time Series Classification , 2021, ArXiv.

[28]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[29]  D. Tao,et al.  A Survey on Vision Transformer , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Hui Xiong,et al.  Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , 2020, AAAI.

[31]  Francisco Martínez-Álvarez,et al.  Deep Learning for Time Series Forecasting: A Survey , 2020, Big Data.

[32]  Chang Xu,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[34]  Yuan Yuan,et al.  Self-Supervised Pretraining of Transformers for Satellite Image Time Series Classification , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[35]  Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification , 2020 .

[36]  Anuradha Bhamidipaty,et al.  A Transformer-based Framework for Multivariate Time Series Representation Learning , 2020, KDD.

[37]  Thomas G. Dietterich,et al.  A Unifying Review of Deep and Shallow Anomaly Detection , 2020, Proceedings of the IEEE.

[38]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ACM Comput. Surv..

[39]  Emine Yilmaz,et al.  Self-Attentive Hawkes Process , 2020, ICML.

[40]  Zhe Zhang,et al.  Fast RobustSTL: Efficient and Robust Seasonal-Trend Decomposition for Time Series with Complex Patterns , 2020, KDD.

[41]  Tie-Yan Liu,et al.  Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.

[42]  Krzysztof Janowicz,et al.  Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting , 2020, Trans. GIS.

[43]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[44]  Shuai Yi,et al.  Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction , 2020, ECCV.

[45]  Bryan Lim,et al.  Time-series forecasting with deep learning: a survey , 2020, Philosophical Transactions of the Royal Society A.

[46]  Jimmy J. Lin,et al.  DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference , 2020, ACL.

[47]  Song Han,et al.  Lite Transformer with Long-Short Range Attention , 2020, ICLR.

[48]  Syama Sundar Rangapuram,et al.  Deep Learning for Time Series Forecasting: Tutorial and Literature Survey , 2020, ACM Comput. Surv..

[49]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[50]  Qingsong Wen,et al.  Time Series Data Augmentation for Deep Learning: A Survey , 2020, IJCAI.

[51]  Hongyuan Zha,et al.  Transformer Hawkes Process , 2020, ICML.

[52]  Liang Sun,et al.  RobustPeriod: Time-Frequency Mining for Robust Multiple Periodicities Detection , 2020, ArXiv.

[53]  Yujing Hu,et al.  Efficient Deep Reinforcement Learning via Adaptive Policy Transfer , 2020, IJCAI.

[54]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[55]  Jose A. Lozano,et al.  A Review on Outlier/Anomaly Detection in Time Series Data , 2020, ACM Comput. Surv..

[56]  Weiyao Lin,et al.  Spatial-Temporal Transformer Networks for Traffic Flow Forecasting , 2020, ArXiv.

[57]  Sashank J. Reddi,et al.  Are Transformers universal approximators of sequence-to-sequence functions? , 2019, ICLR.

[58]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.

[59]  Marc Rußwurm,et al.  Self-Attention for Raw Optical Satellite Time Series Classification , 2019, ArXiv.

[60]  Junchi Yan,et al.  Modeling and Applications for Temporal Point Processes , 2019, KDD.

[61]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[62]  Rohan Ramanath,et al.  An Attentive Survey of Attention Models , 2019, ACM Trans. Intell. Syst. Technol..

[63]  Paolo Torroni,et al.  Attention in Natural Language Processing , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[65]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[66]  Xiaomin Song,et al.  RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series , 2018, AAAI.

[67]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[68]  Ankur Bapna,et al.  Training Deeper Neural Machine Translation Models with Transparent Attention , 2018, EMNLP.

[69]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[70]  Shuang Xu,et al.  Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[71]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[72]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[73]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[74]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[75]  V. Mani,et al.  Crop Stage Classification of Hyperspectral Data Using Unsupervised Techniques , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[76]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[77]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[78]  Heather A. Thomas,et al.  Measurement , 2007, Nature.

[79]  Junchi Yan,et al.  Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting , 2023, ICLR.

[80]  Alex X. Liu,et al.  Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting , 2022, ICLR.

[81]  D. Matteson,et al.  Probabilistic Transformer For Time Series Analysis , 2021, NeurIPS.

[82]  Tianjun Xiao,et al.  Supplementary Material for GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction , 2021 .

[83]  Jihun Yi,et al.  Deep Learning for Anomaly Detection in Time-Series Data: Review, Analysis, and Guidelines , 2021, IEEE Access.

[84]  Luke S. Zettlemoyer,et al.  DeLighT: Deep and Light-weight Transformer , 2021, ICLR.

[85]  Voice2Series: Reprogramming Acoustic Models for Time Series Classification , 2021 .

[86]  Yuxuan Zhang,et al.  Spacecraft Anomaly Detection via Transformer Reconstruction Error , 2020, Lecture Notes in Electrical Engineering.

[87]  Xi Xiao,et al.  Adversarial Sparse Transformer for Time Series Forecasting , 2020, NeurIPS.

[88]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[89]  Jinsung Yoon,et al.  GENERATIVE ADVERSARIAL NETS , 2018 .

[90]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[91]  Rob J. Hyndman,et al.  International Journal of Forecasting , 2022 .