SLPC: A VRNN-based approach for stochastic lidar prediction and completion in autonomous driving

Predicting future 3D LiDAR pointclouds is a challenging task that is useful in many applications in autonomous driving such as trajectory prediction, pose forecasting and decision making. In this work, we propose a new LiDAR prediction framework that is based on generative models namely Variational Recurrent Neural Networks (VRNNs), titled Stochastic LiDAR Prediction and Completion (SLPC). Our algorithm is able to address the limitations of previous video prediction frameworks when dealing with sparse data by spatially inpainting the depth maps in the upcoming frames. Our contributions can thus be summarized as follows: we introduce the new task of predicting and completing depth maps from spatially sparse data, we present a sparse version of VRNNs and an effective self-supervised training method that does not require any labels. Experimental results illustrate the effectiveness of our framework in comparison to the state of the art methods in video prediction.

[1]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Bin Yang,et al.  Person Identification and Body Mass Index: A Deep Learning-Based Study on Micro-Dopplers , 2018, 2019 IEEE Radar Conference (RadarConf).

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Dengxin Dai,et al.  Don’t Forget The Past: Recurrent Depth Estimation from Monocular Video , 2020, IEEE Robotics and Automation Letters.

[6]  Katherine Rose Driggs-Campbell,et al.  Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[7]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[8]  Bin Yang,et al.  Towards Adversarial Denoising of Radar Micro-Doppler Signatures , 2018, 2019 International Radar Conference (RADAR).

[9]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[10]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[11]  Kris M. Kitani,et al.  Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting , 2020, ArXiv.

[12]  Rory A. Cooper,et al.  Stairs detection for enhancing wheelchair capabilities based on radar sensors , 2017, 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE).

[13]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[14]  Gang Wang,et al.  2D LiDAR Map Prediction via Estimating Motion Flow with GRU , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[15]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[16]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Jon Barker,et al.  SDC-Net: Video Prediction Using Spatially-Displaced Convolution , 2018, ECCV.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Aaron C. Courville,et al.  Improved Conditional VRNNs for Video Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[23]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[24]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[25]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[26]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[27]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[28]  Urs Schneider,et al.  An Adversarial Super-Resolution Remedy for Radar Design Trade-offs , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[29]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[30]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[31]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[32]  Klaus C. J. Dietmayer,et al.  Long-Term Occupancy Grid Prediction Using Recurrent Neural Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).