Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

For autonomous agents to successfully operate in the real world, the ability to anticipate future scene states is a key competence. In real-world scenarios, future states become increasingly uncertain and multi-modal, particularly on long time horizons. Dropout based Bayesian inference provides a computationally tractable, theoretically well grounded approach to learn likely hypotheses/models to deal with uncertain futures and make predictions that correspond well to observations -- are well calibrated. However, it turns out that such approaches fall short to capture complex real-world scenes, even falling behind in accuracy when compared to the plain deterministic approaches. This is because the used log-likelihood estimate discourages diversity. In this work, we propose a novel Bayesian formulation for anticipating future scene states which leverages synthetic likelihoods that encourage the learning of diverse models to accurately capture the multi-modal nature of future scene states. We show that our approach achieves accurate state-of-the-art predictions and calibrated probabilities through extensive experiments for scene anticipation on Cityscapes dataset. Moreover, we show that our approach generalizes across diverse tasks such as digit generation and precipitation forecasting.

[1]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[4]  Ian Osband,et al.  Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout , 2016 .

[5]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[6]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[7]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[8]  Yang Wang,et al.  Future Semantic Segmentation with Convolutional LSTM , 2018, BMVC.

[9]  Shakir Mohamed,et al.  Variational Approaches for Auto-Encoding Generative Adversarial Networks , 2017, ArXiv.

[10]  Bernt Schiele,et al.  Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[13]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[14]  Gang Hua,et al.  CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Yann LeCun,et al.  Predicting Deeper into the Future of Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[17]  Shuicheng Yan,et al.  Predicting Scene Parsing and Motion Dynamics in the Future , 2017, NIPS.

[18]  Andrew Gordon Wilson,et al.  Bayesian GAN , 2017, NIPS.

[19]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[20]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[21]  Bernt Schiele,et al.  Accurate and Diverse Sampling of Sequences Based on a "Best of Many" Sample Objective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[23]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[24]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Yann LeCun,et al.  Predicting Future Instance Segmentations by Forecasting Convolutional Features , 2018, ECCV.

[29]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[30]  Sergey Levine,et al.  MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.