Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
暂无分享,去创建一个
Wenhu Chen | Xifeng Yan | Xiaoyong Jin | Shiyang Li | Yu-Xiang Wang | Yao Xuan | Xiyou Zhou | Yu-Xiang Wang | Xifeng Yan | Wenhu Chen | Xiaoyong Jin | SHIYANG LI | Yao Xuan | Xiyou Zhou
[1] Valentin Flunkert,et al. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.
[2] John F. MacGregor,et al. Some Recent Advances in Forecasting and Control , 1968 .
[3] Michael Y. Hu,et al. Forecasting with artificial neural networks: The state of the art , 1997 .
[4] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[5] Douglas Eck,et al. An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation , 2018, ArXiv.
[6] Sander Bohte,et al. Conditional Time Series Forecasting with Convolutional Neural Networks , 2017, 1703.04691.
[7] Guokun Lai,et al. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , 2017, SIGIR.
[8] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[9] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[13] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[14] Diederik P. Kingma,et al. GPU Kernels for Block-Sparse Weights , 2017 .
[15] Nicolas Chapados,et al. Effective Bayesian Modeling of Groups of Related Count Time Series , 2014, ICML.
[16] Ugur Demiryurek,et al. Deep Learning: A Generic Approach for Extreme Condition Traffic Forecasting , 2017, SDM.
[17] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[18] Xifeng Yan,et al. Multi-step Deep Autoregressive Forecasting with Latent States , 2019 .
[19] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.
[20] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[21] Matthias W. Seeger,et al. Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.
[22] Gwilym M. Jenkins,et al. Time series analysis, forecasting and control , 1971 .
[23] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[24] K. Torkkola,et al. A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.
[25] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[26] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Daniel Jurafsky,et al. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.
[29] Kilian Q. Weinberger,et al. CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] J. Yosinski,et al. Time-series Extreme Event Forecasting with Neural Networks at Uber , 2017 .
[31] Alex Smola,et al. Deep Factors with Gaussian Processes for Forecasting , 2018, ArXiv.
[32] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[33] P. Young,et al. Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.
[34] Yisong Yue,et al. Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.
[35] Melvin J. Hinich,et al. Time Series Analysis by State Space Methods , 2001 .
[36] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[37] Inderjit S. Dhillon,et al. Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction , 2016, NIPS.
[38] Ke Li,et al. A Time-Restricted Self-Attention Layer for ASR , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Yisong Yue,et al. Long-term Forecasting using Higher Order Tensor RNNs , 2017 .
[40] Evangelos Spiliotis,et al. The M4 Competition: Results, findings, conclusion and way forward , 2018, International Journal of Forecasting.
[41] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.