PointFlowNet: Learning Representations for 3D Scene Flow Estimation from Point Clouds

Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D scene flow from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.

[1]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Yael Moses,et al.  Multi-view scene flow estimation: A view centered variational approach , 2010, CVPR.

[5]  Daniel Cremers,et al.  Stereoscopic Scene Flow Computation for 3D Motion Understanding , 2011, International Journal of Computer Vision.

[6]  Wei Chen,et al.  Learning Deep Correspondence through Prior and Posterior Feature Constancy , 2017, ArXiv.

[7]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Andrew W. Fitzgibbon,et al.  SphereFlow: 6 DoF Scene Flow from RGB-D Pairs , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Christoph Stiller,et al.  Joint self-localization and tracking of generic objects in 3D range data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[10]  Konrad Schindler,et al.  3D scene flow estimation with a rigid motion prior , 2011, 2011 International Conference on Computer Vision.

[11]  Dieter Fox,et al.  RGB-D flow: Dense 3-D motion estimation using color and depth , 2013, 2013 IEEE International Conference on Robotics and Automation.

[12]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Paulo Peixoto,et al.  3D object tracking using RGB and LIDAR data , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[14]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[15]  Ryan M. Eustice,et al.  A learning approach for real-time temporal scene flow estimation from LIDAR data , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Joachim Weickert,et al.  Joint Estimation of Motion, Structure and Geometry from Stereo Sequences , 2010, ECCV.

[17]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Wolfram Burgard,et al.  Rigid scene flow for 3D LiDAR scans , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Konrad Schindler,et al.  3D Scene Flow Estimation with a Piecewise Rigid Scene Model , 2015, International Journal of Computer Vision.

[21]  Christian Heipke,et al.  Joint 3d Estimation of Vehicles and Scene Flow , 2015 .

[22]  Fabio Tozeto Ramos,et al.  An integrated probabilistic model for scan-matching, moving object detection and motion estimation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[23]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[26]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Silvio Savarese,et al.  Robust real-time tracking combining 3D shape, color, and motion , 2016, Int. J. Robotics Res..

[28]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Frank Dellaert,et al.  A Continuous Optimization Approach for Efficient and Accurate Scene Flow , 2016, ECCV.

[30]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[31]  Uwe Franke,et al.  6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception , 2005, DAGM-Symposium.

[32]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[33]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Daniel Cremers,et al.  Efficient Dense Scene Flow from Sparse or Dense Stereo Data , 2008, ECCV.

[35]  Julius Ziegler,et al.  Sparse scene flow segmentation for moving object detection in urban environments , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[36]  Christopher Zach,et al.  SPP-Net: Deep Absolute Pose Regression with Synthetic Views , 2017, ArXiv.

[37]  Thomas Brox,et al.  Dense Semi-rigid Scene Flow Estimation from RGBD Images , 2014, ECCV.

[38]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[39]  Wolfram Burgard,et al.  Motion-based detection and tracking in 3D LiDAR scans , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Sergiu Nedevschi,et al.  Modeling and Tracking the Driving Environment With a Particle-Based Occupancy Grid , 2011, IEEE Transactions on Intelligent Transportation Systems.

[41]  Martin Buss,et al.  Grid-based mapping and tracking in dynamic environments using a uniform evidential environment representation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).