Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

[1]  Andreas Spanias,et al.  Attend and Diagnose: Clinical Time Series Analysis using Attention Models , 2017, AAAI.

[2]  J. Vargas-Guzmán,et al.  Change of Support of Transformations: Conservation of Lognormality Revisited , 2005 .

[3]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[4]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[5]  M. Romeo,et al.  Broad distribution effects in sums of lognormal random variables , 2002, physics/0211065.

[6]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7]  Peng Xu,et al.  Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer , 2020, AISTATS.

[8]  Yisong Yue,et al.  Long-term Forecasting using Higher Order Tensor RNNs , 2017 .

[9]  Norman C. Beaulieu An Extended Limit Theorem for Correlated Lognormal Sums , 2012, IEEE Transactions on Communications.

[10]  Sebastian Ewert,et al.  Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling , 2019, IJCAI.

[11]  C. Lo The Sum and Difference of Two Lognormal Random Variables , 2013 .

[12]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[13]  Ling Yang,et al.  DSTP-RNN: a dual-stage two-phase attention-based recurrent neural networks for long-term and multivariate time series prediction , 2019, Expert Syst. Appl..

[14]  Makoto Yamada,et al.  Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel , 2019, EMNLP/IJCNLP.

[15]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[16]  Ridha Bouallegue,et al.  On the Approximation of the Sum of Lognormals by a Log Skew Normal Distribution , 2015 .

[17]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[18]  Omer Levy,et al.  Blockwise Self-Attention for Long Document Understanding , 2020, EMNLP.

[19]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[20]  Vladlen Koltun,et al.  Convolutional Sequence Modeling Revisited , 2018, ICLR.

[21]  D. Dufresne SUMS OF LOGNORMALS , 2008 .

[22]  Jacob Weiner,et al.  The meaning and measurement of size hierarchies in plant populations , 1984, Oecologia.

[23]  Matthias W. Seeger,et al.  Approximate Bayesian Inference in Linear State Space Models for Intermittent Demand Forecasting at Scale , 2017, ArXiv.

[24]  Aderemi Oluyinka Adewumi,et al.  Stock Price Prediction Using the ARIMA Model , 2014, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation.

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Quoc V. Le,et al.  Learning Longer-term Dependencies in RNNs with Auxiliary Losses , 2018, ICML.

[27]  Sunita Sarawagi,et al.  ARMDN: Associative and Recurrent Mixture Density Networks for eRetail Demand Forecasting , 2018, ArXiv.

[28]  Timothy P. Lillicrap,et al.  Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.

[29]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[30]  Emily B. Fox,et al.  Adaptively Truncating Backpropagation Through Time to Control Gradient Bias , 2019, UAI.

[31]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[32]  Christos Faloutsos,et al.  FUNNEL: automatic mining of spatially coevolving epidemics , 2014, KDD.

[33]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[34]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Shih-Fu Chang,et al.  CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation , 2019, ArXiv.

[37]  Guokun Lai,et al.  Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , 2017, SIGIR.

[38]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[39]  Yisong Yue,et al.  Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.

[40]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[41]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[42]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[43]  Matthias W. Seeger,et al.  Bayesian Intermittent Demand Forecasting for Large Inventories , 2016, NIPS.

[44]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[46]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[47]  Shou-De Lin,et al.  A Memory-Network Based Solution for Multivariate Time-Series Forecasting , 2018, ArXiv.

[48]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[49]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[50]  Philip S. Yu,et al.  Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[51]  Giuseppe Carlo Calafiore,et al.  Log-Sum-Exp Neural Networks and Posynomial Models for Convex and Log-Log-Convex Data , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[52]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[53]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[54]  Benjamin Letham,et al.  Forecasting at Scale , 2018, PeerJ Prepr..

[55]  Alexander M. Rush,et al.  Dilated Convolutions for Modeling Long-Distance Genomic Dependencies , 2017, bioRxiv.

[56]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[57]  Cyrus Shahabi,et al.  Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[58]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.