CLMFormer: Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

Long-term time-series forecasting (LTSF) plays a crucial role in various practical applications. Transformer and its variants have become the de facto backbone for LTSF, offering exceptional capabilities in processing long sequence data. However, existing Transformer-based models, such as Fedformer and Informer, often achieve their best performances on validation sets after just a few epochs, indicating potential underutilization of the Transformer's capacity. One of the reasons that contribute to this overfitting is data redundancy arising from the rolling forecasting settings in the data augmentation process, particularly evident in longer sequences with highly similar adjacent data. In this paper, we propose a novel approach to address this issue by employing curriculum learning and introducing a memory-driven decoder. Specifically, we progressively introduce Bernoulli noise to the training samples, which effectively breaks the high similarity between adjacent data points. To further enhance forecasting accuracy, we introduce a memory-driven decoder. This component enables the model to capture seasonal tendencies and dependencies in the time-series data and leverages temporal relationships to facilitate the forecasting process. The experimental results on six real-life LTSF benchmarks demonstrate that our approach can be seamlessly plugged into varying Transformer-based models, with our approach enhancing the LTSF performances of various Transformer-based models by maximally 30%.

[1]  Lingxi Xie,et al.  Author Correction: Accurate medium-range global weather forecasting with 3D neural networks , 2023, Nature.

[2]  Tiechui Yao,et al.  InParformer: Evolutionary Decomposition Transformers with Interactive Parallel Attention for Long-Term Time Series Forecasting , 2023, AAAI.

[3]  Tian Zhou,et al.  How Expressive are Spectral-Temporal Graph Neural Networks for Time Series Forecasting? , 2023, ArXiv.

[4]  Lin Geng Foo,et al.  Progressive Channel-Shrinking Network , 2023, IEEE Transactions on Multimedia.

[5]  J. Liu,et al.  Sounding Video Generator: A Unified Framework for Text-Guided Sounding Video Generation , 2023, IEEE Transactions on Multimedia.

[6]  Xiaojun Chang,et al.  Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Weina Chen,et al.  Machine learning prediction for COVID-19 disease severity at hospital admission , 2023, BMC Medical Informatics and Decision Making.

[8]  J. Kalagnanam,et al.  A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , 2022, ICLR.

[9]  Junkai Ji,et al.  A survey on machine learning models for financial time series forecasting , 2022, Neurocomputing.

[10]  Xiaojun Chang,et al.  Video Pivoting Unsupervised Multi-Modal Machine Translation , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  L. Zhang,et al.  Are Transformers Effective for Time Series Forecasting? , 2022, AAAI.

[12]  Guosheng Lin,et al.  A Unified Transformer Framework for Group-Based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection , 2022, IEEE Transactions on Multimedia.

[13]  Junchi Yan,et al.  Transformers in Time Series: A Survey , 2022, IJCAI.

[14]  Tian Zhou,et al.  FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting , 2022, ICML.

[15]  Bin Sheng,et al.  EAPT: Efficient Attention Pyramid Transformer for Image Processing , 2021, IEEE Transactions on Multimedia.

[16]  Jianmin Wang,et al.  Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , 2021, NeurIPS.

[17]  Zhengcong Fei,et al.  Memory-Augmented Image Captioning , 2021, AAAI.

[18]  Hui Xiong,et al.  Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , 2020, AAAI.

[19]  Juan Pino,et al.  Streaming Simultaneous Speech Translation with Augmented Memory Transformer , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Tsung-Hui Chang,et al.  Generating Radiology Reports via Memory-driven Transformer , 2020, EMNLP.

[21]  Xiaojun Chang,et al.  Auxiliary signal-guided knowledge encoder-decoder for medical report generation , 2020, World Wide Web.

[22]  Syama Sundar Rangapuram,et al.  Deep Learning for Time Series Forecasting: Tutorial and Literature Survey , 2020, ACM Comput. Surv..

[23]  Bo Zong,et al.  Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series , 2020, AAAI.

[24]  Adrià Puigdomènech Badia,et al.  MEMO: A Deep Network for Flexible Combination of Episodic Memories , 2020, ICLR.

[25]  Mark Coates,et al.  Memory Augmented Graph Neural Networks for Sequential Recommendation , 2019, AAAI.

[26]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[27]  Chunhua Shen,et al.  Visual Question Answering with Memory-Augmented Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[30]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[31]  René Vidal,et al.  Curriculum Dropout , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Zhongfei Zhang,et al.  Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs , 2015, SDM.

[33]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[34]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[35]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[36]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[37]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[38]  John F. MacGregor,et al.  Some Recent Advances in Forecasting and Control , 1968 .

[39]  Shengsheng Qian,et al.  Positive Unlabeled Fake News Detection via Multi-Modal Masked Transformer Network , 2024, IEEE Transactions on Multimedia.

[40]  Wen-gang Zhou,et al.  Progressive Similarity Preservation Learning for Deep Scalable Product Quantization , 2024, IEEE Transactions on Multimedia.

[41]  Xiaojun Chang,et al.  Context Matters: Distilling Knowledge Graph for Enhanced Object Detection , 2023, IEEE Transactions on Multimedia.

[42]  Xi Xiao,et al.  Adversarial Sparse Transformer for Time Series Forecasting , 2020, NeurIPS.

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[44]  Ming Jiang,et al.  Transformer Based Memory Network for Sentiment Analysis of Web Comments , 2019, IEEE Access.

[45]  Everette S. Gardner,et al.  Exponential smoothing: The state of the art , 1985 .