Conditional Flow Variational Autoencoders for Structured Sequence Prediction

Prediction of future states of the environment and interacting agents is a key competence required for autonomous agents to operate successfully in the real world. Prior work for structured sequence prediction based on latent variable models imposes a uni-modal standard Gaussian prior on the latent variables. This induces a strong model bias which makes it challenging to fully capture the multi-modality of the distribution of the future states. In this work, we introduce Conditional Flow Variational Autoencoders (CF-VAE) using our novel conditional normalizing flow based prior to capture complex multi-modal conditional distributions for effective structured sequence prediction. Moreover, we propose two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution. Our experiments on three multi-modal structured sequence prediction datasets -- MNIST Sequences, Stanford Drone and HighD -- show that the proposed method obtains state of art results across different evaluation metrics.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaodong Gu,et al.  DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder , 2018, ICLR.

[4]  William Yang Wang,et al.  Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling , 2019, NAACL.

[5]  Sergey Levine,et al.  VideoFlow: A Flow-Based Generative Model for Video , 2019, ArXiv.

[6]  Shakir Mohamed,et al.  Variational Approaches for Auto-Encoding Generative Adversarial Networks , 2017, ArXiv.

[7]  Christoph H. Lampert,et al.  Back to square one: probabilistic trajectory forecasting without bells and whistles , 2018, ArXiv.

[8]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[10]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[11]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[12]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[13]  Dmitry Vetrov,et al.  Semi-Conditional Normalizing Flows for Semi-Supervised Learning , 2019, ArXiv.

[14]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[15]  Mohan M. Trivedi,et al.  Convolutional Social Pooling for Vehicle Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  G. C. Holmes The use of hyperbolic cosines in solving cubic polynomials , 2002, The Mathematical Gazette.

[17]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[18]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Svetlana Lazebnik,et al.  Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space , 2017, NIPS.

[20]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[21]  Marco Cote STICK-BREAKING VARIATIONAL AUTOENCODERS , 2017 .

[22]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[24]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[25]  Max Welling,et al.  Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[26]  Alexander M. Rush,et al.  Avoiding Latent Variable Collapse With Generative Skip Models , 2018, AISTATS.

[27]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[28]  Ali Razavi,et al.  Preventing Posterior Collapse with delta-VAEs , 2019, ICLR.

[29]  Silvio Savarese,et al.  Single-source Attention Path Prediction Multi-source Attention Predicted Observed , 2018 .

[30]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[31]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[32]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[33]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[34]  Ullrich Köthe,et al.  Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.

[35]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[36]  Bernt Schiele,et al.  Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods , 2018, ICLR.

[37]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Bert Huang,et al.  Structured Output Learning with Conditional Generative Flows , 2019, AAAI.

[39]  Ying Nian Wu,et al.  Multi-Agent Tensor Fusion for Contextual Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Mohan M. Trivedi,et al.  Scene Induced Multi-Modal Trajectory Forecasting via Planning , 2019, ArXiv.

[41]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[42]  Alois Knoll,et al.  Graph Neural Networks for Modelling Traffic Participant Interaction , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[43]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[44]  Lutz Eckstein,et al.  The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[45]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[46]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[47]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[48]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[49]  Shakir Mohamed,et al.  Distribution Matching in Variational Inference , 2018, ArXiv.

[50]  Bernt Schiele,et al.  Accurate and Diverse Sampling of Sequences Based on a "Best of Many" Sample Objective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Alexander M. Rush,et al.  Latent Normalizing Flows for Discrete Sequences , 2019, ICML.

[52]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[53]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[54]  E. Tabak,et al.  DENSITY ESTIMATION BY DUAL ASCENT OF THE LOG-LIKELIHOOD ∗ , 2010 .

[55]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.