Vision-Guided Forecasting - Visual Context for Multi-Horizon Time Series Forecasting

Autonomous driving gained huge traction in recent years, due to its potential to change the way we commute. Much effort has been put into trying to estimate the state of a vehicle. Meanwhile, learning to forecast the state of a vehicle ahead introduces new capabilities, such as predicting dangerous situations. Moreover, forecasting brings new supervision opportunities by learning to predict richer a context, expressed by multiple horizons. Intuitively, a video stream originated from a front facing camera is necessary because it encodes information about the upcoming road. Besides, historical traces of the vehicle’s states gives more context. In this paper we tackle multi-horizon forecasting of vehicle states by fusing the two modalities. We design and experiment with 3 end-to-end architectures that exploit 3D convolutions for visual features extraction and 1D convolutions for features extraction from speed and steering angle traces. To demonstrate the effectiveness of our method, we perform extensive experiments on two publicly available real-world datasets, Comma2k19 and the Udacity challenge. We show that we are able to forecast a vehicle’s state to various horizons, while outperforming the current state of the art results on the related task of driving state estimation. We examine the contribution of vision features, and find that a model fed with vision features achieves an error that is 56.6% and 66.9% of the error of a model that doesn’t use those features, on the Udacity and Comma2k19 datasets respectively.

[1]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[2]  Kristin J. Dana,et al.  Feudal Steering: Hierarchical Learning for Steering Angle Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[4]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[5]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Eder Santana,et al.  A Commute in Data: The comma2k19 Dataset , 2018, ArXiv.

[7]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[8]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Antonio J. Plaza,et al.  Image Segmentation Using Deep Learning: A Survey , 2021, IEEE transactions on pattern analysis and machine intelligence.

[12]  Sorin Grigorescu,et al.  A Survey of Deep Learning Techniques for Autonomous Driving , 2020, J. Field Robotics.

[13]  Varun Dutt,et al.  AI in Healthcare: Time-Series Forecasting Using Statistical, Neural, and Ensemble Architectures , 2020, Frontiers in Big Data.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Stefan Zohren,et al.  Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[16]  Graham Neubig,et al.  Neural Machine Translation and Sequence-to-sequence Models: A Tutorial , 2017, ArXiv.

[17]  Sparsh Mittal,et al.  A survey of accelerator architectures for 3D convolution neural networks , 2021, J. Syst. Archit..

[18]  Tim Januschowski,et al.  Deep Factors for Forecasting , 2019, ICML.

[19]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Matthias W. Seeger,et al.  Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[21]  H. Shaikh,et al.  Survey: Stock Market Prediction Using Statistical Computational Methodologies and Artificial Neural Networks , 2015 .

[22]  Rafał Stanisław Jurecki,et al.  Driver response time in different traffic situations for using in accident analysis , 2016 .

[23]  Jiebo Luo,et al.  End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[24]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[25]  Shuicheng Yan,et al.  Multi-Fiber Networks for Video Recognition , 2018, ECCV.

[26]  H. H. Mao,et al.  A Survey on Self-supervised Pre-training for Sequential Transfer Learning in Neural Networks , 2020, ArXiv.

[27]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[28]  Yi Xiao,et al.  Multimodal End-to-End Autonomous Driving , 2019, IEEE Transactions on Intelligent Transportation Systems.

[29]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Carlos Capistrán,et al.  Multi-horizon inflation forecasts using disaggregated data , 2010 .

[31]  Bryan Lim,et al.  Time-series forecasting with deep learning: a survey , 2020, Philosophical Transactions of the Royal Society A.

[32]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[33]  Sean D. Campbell,et al.  Weather Forecasting for Weather Derivatives , 2002 .

[34]  Baoxin Li,et al.  A survey of variational and CNN-based optical flow techniques , 2019, Signal Process. Image Commun..

[35]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[36]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[37]  Chunxiang Wang,et al.  SteeringLoss: Theory and Application for Steering Prediction , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[38]  Yann LeCun,et al.  A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Segun I. Popoola,et al.  A Survey on Deep Learning for Steering Angle Prediction in Autonomous Vehicles , 2020, IEEE Access.

[40]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[41]  Bo Yu,et al.  Building the Computing System for Autonomous Micromobility Vehicles: Design Constraints and Architectural Optimizations , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[42]  Ming Yang,et al.  SteeringLoss: A Cost-Sensitive Loss Function for the End-to-End Steering Estimation , 2021, IEEE Transactions on Intelligent Transportation Systems.

[43]  Afreen Siddiqi,et al.  Autonomous driving systems hardware and software architecture exploration: optimizing latency and cost under safety constraints , 2019, Syst. Eng..

[44]  Ekaterina Komendantskaya,et al.  Accuracy, Training Time and Hardware Efficiency Trade-Offs for Quantized Neural Networks on FPGAs , 2020, ARC.

[45]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[46]  Chunxiao Liu,et al.  Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks , 2018, AAAI.

[47]  Andrew J. Patton,et al.  Predictability of Output Growth and Inflation: A Multi-Horizon Survey Approach , 2011 .

[48]  S. Rajaram,et al.  A survey on forecasting of time series data , 2016, 2016 International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE'16).

[49]  Andrew Simpson,et al.  Self-Driving Car Steering Angle Prediction Based on Image Recognition , 2019, ArXiv.