Preserving Dynamic Attention for Long-Term Spatial-Temporal Prediction

Effective long-term predictions have been increasingly demanded in urban-wise data mining systems. Many practical applications, such as accident prevention and resource pre-allocation, require an extended period for preparation. However, challenges come as long-term prediction is highly error-sensitive, which becomes more critical when predicting urban-wise phenomena with complicated and dynamic spatial-temporal correlation. Specifically, since the amount of valuable correlation is limited, enormous irrelevant features introduce noises that trigger increased prediction errors. Besides, after each time step, the errors can traverse through the correlations and reach the spatial-temporal positions in every future prediction, leading to significant error propagation. To address these issues, we propose a Dynamic Switch-Attention Network (DSAN) with a novel Multi-Space Attention (MSA) mechanism that measures the correlations between inputs and outputs explicitly. To filter out irrelevant noises and alleviate the error propagation, DSAN dynamically extracts valuable information by applying self-attention over the noisy input and bridges each output directly to the purified inputs via implementing a switch-attention mechanism. Through extensive experiments on two spatial-temporal prediction tasks, we demonstrate the superior advantage of DSAN in both short-term and long-term predictions. The source code can be obtained from https://github.com/hxstarklin/DSAN.

[1]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[2]  Yunpeng Wang,et al.  Long short-term memory neural network for traffic speed prediction using remote microwave sensor data , 2015 .

[3]  Cheng Wang,et al.  GMAN: A Graph Multi-Attention Network for Traffic Prediction , 2019, AAAI.

[4]  Pengpeng Zhao,et al.  LC-RNN: A Deep Learning Model for Traffic Speed Prediction , 2018, IJCAI.

[5]  Dit-Yan Yeung,et al.  Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model , 2017, NIPS.

[6]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[7]  Huachun Tan,et al.  Short-term traffic flow forecasting with spatial-temporal correlation in a hybrid deep learning framework , 2016, ArXiv.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Xianfeng Tang,et al.  Revisiting Spatial-Temporal Similarity: A Deep Learning Framework for Traffic Prediction , 2018, AAAI.

[10]  Cyrus Shahabi,et al.  Graph Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ArXiv.

[11]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[12]  Xiuwen Yi,et al.  DNN-based prediction model for spatio-temporal data , 2016, SIGSPATIAL/GIS.

[13]  Yu Zheng,et al.  Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction , 2016, AAAI.

[14]  Jieping Ye,et al.  Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction , 2018, AAAI.

[15]  Xiqun Chen,et al.  Short-Term Forecasting of Passenger Demand under On-Demand Ride Services: A Spatio-Temporal Deep Learning Approach , 2017, ArXiv.

[16]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[17]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Zhanxing Zhu,et al.  Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[20]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[21]  Xuan Song,et al.  DeepTransport: Prediction and Simulation of Human Mobility and Transportation Mode at a Citywide Level , 2016, IJCAI.

[22]  Chenglu Wen,et al.  DeepSTD: Mining Spatio-Temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction , 2020, IEEE Transactions on Intelligent Transportation Systems.

[23]  Junbo Zhang,et al.  Flow Prediction in Spatio-Temporal Networks Based on Multitask Deep Learning , 2020, IEEE Transactions on Knowledge and Data Engineering.

[24]  Jieping Ye,et al.  Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting , 2019, AAAI.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Doina Precup,et al.  Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks , 2019, NeurIPS.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Qi Zhang,et al.  GSTNet: Global Spatial-Temporal Network for Traffic Flow Prediction , 2019, IJCAI.

[30]  Zhiyong Cui,et al.  Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction , 2018, ArXiv.

[31]  Hong Cheng,et al.  Predicting Path Failure In Time-Evolving Graphs , 2019, KDD.

[32]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[33]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[34]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[35]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .