Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture components that suffer from instabilities in training and mode collapse. In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes. Moreover, we discuss how to evaluate predicted multimodal distributions, including the common real scenario, where only a single sample from the ground-truth distribution is available for evaluation. We show on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse.

[1]  Yoichi Sato,et al.  Future Person Localization in First-Person Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yann LeCun,et al.  Predicting Deeper into the Future of Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Bernt Schiele,et al.  Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods , 2018, ICLR.

[4]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  David A. Forsyth,et al.  Structural Consistency and Controllability for Diverse Colorization , 2018, ECCV.

[6]  Bernt Schiele,et al.  Accurate and Diverse Sampling of Sequences Based on a "Best of Many" Sample Objective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Kyungjae Lee,et al.  Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Wei Zhan,et al.  Probabilistic Prediction of Vehicle Semantic Intention and Motion , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[9]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[10]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[12]  Niloy J. Mitra,et al.  Learning A Physical Long-term Predictor , 2017, ArXiv.

[13]  Pushmeet Kohli,et al.  Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[14]  Yann LeCun,et al.  Predicting Future Instance Segmentations by Forecasting Convolutional Features , 2018, ECCV.

[15]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[16]  Wolfram Burgard,et al.  Multimodal interaction-aware motion prediction for autonomous street crossing , 2018, Int. J. Robotics Res..

[17]  Michael Cogswell,et al.  Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles , 2016, NIPS.

[18]  Wei Liu,et al.  Deep Learning Driven Visual Path Prediction From a Single Image , 2016, IEEE Transactions on Image Processing.

[19]  Alexei A. Efros,et al.  Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[20]  Mario Sznaier,et al.  DYAN: A Dynamical Atoms-Based Network for Video Prediction , 2018, ECCV.

[21]  Kostas Daniilidis,et al.  Learning what you can do before doing anything , 2018, ICLR.

[22]  Yu Yao,et al.  Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[23]  Henggang Cui,et al.  Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Shuicheng Yan,et al.  Predicting Scene Parsing and Motion Dynamics in the Future , 2017, NIPS.

[25]  Maximilian Baust,et al.  Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[27]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[28]  G. Yule On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers , 1927 .

[29]  Sebastian Nowozin,et al.  Deep Directional Statistics: Pose Estimation with Uncertainty Quantification , 2018, ECCV.

[30]  H. Akaike Power spectrum estimation through autoregressive model fitting , 1969 .

[31]  Ruben Villegas,et al.  Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.

[32]  Ming-Hsuan Yang,et al.  Flow-Grounded Spatial-Temporal Video Prediction from Still Images , 2018, ECCV.

[33]  Joseph Curro,et al.  Deriving confidence from artificial neural networks for navigation , 2018, 2018 IEEE/ION Position, Location and Navigation Symposium (PLANS).

[34]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[35]  Henggang Cui,et al.  Motion Prediction of Traffic Actors for Autonomous Driving using Deep Convolutional Networks , 2018, ArXiv.

[36]  Kostas Daniilidis,et al.  Unsupervised Learning of Sensorimotor Affordances by Stochastic Future Prediction , 2018, ArXiv.

[37]  Shenghua Gao,et al.  Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[39]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[40]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[41]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[42]  Ian T. Nabney,et al.  Regularisation of mixture density networks , 1999 .

[43]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[44]  Hongdong Li,et al.  Action Anticipation By Predicting Future Dynamic Images , 2018, ECCV Workshops.

[45]  Thomas Brox,et al.  Uncertainty Estimates and Multi-hypotheses Networks for Optical Flow , 2018, ECCV.

[46]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[47]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[48]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[49]  David W. Jacobs,et al.  Approximate earth mover’s distance in linear time , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Antonio Torralba,et al.  Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ruben Villegas,et al.  Hierarchical Long-term Video Prediction without Supervision , 2018, ICML.

[52]  C. Bishop Mixture density networks , 1994 .

[53]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[54]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[55]  Guillaume Gravier,et al.  One-Step Time-Dependent Future Video Frame Prediction with a Convolutional Encoder-Decoder Neural Network , 2016, ICIAP.

[56]  Marco Pavone,et al.  Distributional Prediction of Human Driving Behaviours using Mixture Density Networks , 2016 .

[57]  Lourdes Agapito,et al.  DiverseNet: When One Right Answer is not Enough , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Michael S. Ryoo,et al.  Forecasting Hands and Objects in Future Frames , 2018, ECCV Workshops.

[59]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[60]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[61]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.