PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds

Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.

[1]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Konrad Schindler,et al.  3D scene flow estimation with a rigid motion prior , 2011, 2011 International Conference on Computer Vision.

[5]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew W. Fitzgibbon,et al.  SphereFlow: 6 DoF Scene Flow from RGB-D Pairs , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Joachim Weickert,et al.  Joint Estimation of Motion, Structure and Geometry from Stereo Sequences , 2010, ECCV.

[9]  Christoph Stiller,et al.  Joint self-localization and tracking of generic objects in 3D range data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[10]  Daniel Cremers,et al.  Stereoscopic Scene Flow Computation for 3D Motion Understanding , 2011, International Journal of Computer Vision.

[11]  Thomas Brox,et al.  Dense Semi-rigid Scene Flow Estimation from RGBD Images , 2014, ECCV.

[12]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Paulo Peixoto,et al.  3D object tracking using RGB and LIDAR data , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[14]  Wei Chen,et al.  Learning Deep Correspondence through Prior and Posterior Feature Constancy , 2017, ArXiv.

[15]  Fabio Tozeto Ramos,et al.  An integrated probabilistic model for scan-matching, moving object detection and motion estimation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[16]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Silvio Savarese,et al.  Robust real-time tracking combining 3D shape, color, and motion , 2016, Int. J. Robotics Res..

[18]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[19]  Yael Moses,et al.  Multi-view Scene Flow Estimation: A View Centered Variational Approach , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[23]  Frank Dellaert,et al.  A Continuous Optimization Approach for Efficient and Accurate Scene Flow , 2016, ECCV.

[24]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[26]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[28]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Christian Heipke,et al.  Joint 3d Estimation of Vehicles and Scene Flow , 2015 .

[30]  Andreas Geiger,et al.  Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Wolfram Burgard,et al.  Motion-based detection and tracking in 3D LiDAR scans , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Wolfram Burgard,et al.  Rigid scene flow for 3D LiDAR scans , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Uwe Franke,et al.  6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception , 2005, DAGM-Symposium.

[37]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[38]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Sergiu Nedevschi,et al.  Modeling and Tracking the Driving Environment With a Particle-Based Occupancy Grid , 2011, IEEE Transactions on Intelligent Transportation Systems.

[40]  Martin Buss,et al.  Grid-based mapping and tracking in dynamic environments using a uniform evidential environment representation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Takeo Kanade,et al.  Three-dimensional scene flow , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[42]  Raquel Urtasun,et al.  Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Konrad Schindler,et al.  3D Scene Flow Estimation with a Piecewise Rigid Scene Model , 2015, International Journal of Computer Vision.

[44]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[45]  Patrick Wieschollek,et al.  Flex-Convolution - Million-Scale Point-Cloud Learning Beyond Grid-Worlds , 2018, ACCV.

[46]  Julius Ziegler,et al.  Sparse scene flow segmentation for moving object detection in urban environments , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[47]  Christopher Zach,et al.  SPP-Net: Deep Absolute Pose Regression with Synthetic Views , 2017, ArXiv.

[48]  Daniel Cremers,et al.  Efficient Dense Scene Flow from Sparse or Dense Stereo Data , 2008, ECCV.

[49]  Ryan M. Eustice,et al.  A learning approach for real-time temporal scene flow estimation from LIDAR data , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Dieter Fox,et al.  RGB-D flow: Dense 3-D motion estimation using color and depth , 2013, 2013 IEEE International Conference on Robotics and Automation.