Weakly Supervised Learning of Rigid 3D Scene Flow

We propose a data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies. At the core of our method lies a deep architecture able to reason at the object-level by considering 3D scene flow in conjunction with other 3D tasks. This object level abstraction enables us to relax the requirement for dense scene flow supervision with simpler binary background segmentation mask and ego-motion annotations. Our mild supervision requirements make our method well suited for recently released massive data collections for autonomous driving, which do not contain dense scene flow annotations. As output, our model provides low-level cues like pointwise flow and higher-level cues such as holistic scene understanding at the level of rigid objects. We further propose a test-time optimization refining the predicted rigid scene flow. We showcase the effectiveness and generalization capacity of our method on four different autonomous driving datasets. We release our source code and pre-trained models under github.com/zgojcic/Rigid3DSceneFlow.

[1]  Jan Kautz,et al.  SENSE: A Shared Encoder Network for Scene-Flow Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  James M. Rehg,et al.  Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation , 2018, ECCV.

[3]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[4]  Rui Hu,et al.  Deep Rigid Instance Scene Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Federico Tombari,et al.  Performance Evaluation of 3D Keypoint Detectors , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[6]  Steven S. Beauchemin,et al.  The computation of optical flow , 1995, CSUR.

[7]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Leonidas Guibas,et al.  Robust Learning Through Cross-Task Consistency , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Lourdes Agapito,et al.  MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[12]  Frederic Devernay,et al.  Multi-Camera Scene Flow by Tracking 3-D Points and Surfels , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Leonidas J. Guibas,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[15]  Leonidas J. Guibas,et al.  CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations , 2020, NeurIPS.

[16]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Nassir Navab,et al.  Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation , 2020, International Journal of Computer Vision.

[18]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[19]  Konrad Schindler,et al.  Piecewise Rigid Scene Flow , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[21]  Abd El Rahman Shabayek,et al.  Deep Learning Advances on Different 3D Data Representations: A Survey , 2018, ArXiv.

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kiriakos N. Kutulakos,et al.  Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance , 2002, International Journal of Computer Vision.

[24]  Andreas Geiger,et al.  Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[26]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Zi Jian Yew,et al.  RPM-Net: Robust Point Matching Using Learned Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Christoph Stiller,et al.  Joint self-localization and tracking of generic objects in 3D range data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[30]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[31]  Anita Sellent,et al.  GraphFlow - 6D Large Displacement Scene Flow via Graph Matching , 2015, GCPR.

[32]  Leonidas J. Guibas,et al.  Synchronizing Probability Measures on Rotations via Optimal Transport , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Daniel Cremers,et al.  A primal-dual framework for real-time dense RGB-D scene flow , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Shi-Min Hu,et al.  ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Armin Mustafa,et al.  Semantically Coherent 4D Scene Flow of Dynamic Scenes , 2019, International Journal of Computer Vision.

[37]  V. Prisacariu,et al.  FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[39]  Brian Okorn,et al.  Just Go With the Flow: Self-Supervised Scene Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Daniel Cremers,et al.  Fast odometry and scene flow from RGB-D cameras based on geometric clustering , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Wolfram Burgard,et al.  Rigid scene flow for 3D LiDAR scans , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Martin Buss,et al.  Grid-based mapping and tracking in dynamic environments using a uniform evidential environment representation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Yi Yang,et al.  PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing , 2019, ArXiv.

[46]  Olivier D. Faugeras,et al.  Modelling dynamic scenes by registering multi-view image sequences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[47]  Slobodan Ilic,et al.  Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC , 2018, NeurIPS.

[48]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: An Open Dataset Benchmark , 2019 .

[50]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[51]  Andrew W. Fitzgibbon,et al.  SphereFlow: 6 DoF Scene Flow from RGB-D Pairs , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[53]  Slobodan Ilic,et al.  SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Raquel Urtasun,et al.  Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Aseem Behl,et al.  PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Marc Pollefeys,et al.  Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion , 2020, 2020 International Conference on 3D Vision (3DV).

[57]  Zhuwen Li,et al.  PointPWC-Net: A Coarse-to-Fine Network for Supervised and Self-Supervised Scene Flow Estimation on 3D Point Clouds , 2019, ArXiv.

[58]  Li Jiang,et al.  PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Zhiguo Cao,et al.  Toward the Repeatability and Robustness of the Local Reference Frame for 3D Shape Matching: An Evaluation , 2018, IEEE Transactions on Image Processing.

[60]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Leonidas J. Guibas,et al.  GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[63]  Vladlen Koltun,et al.  Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[64]  Bingbing Ni,et al.  Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  U SaputraMuhamad Risqi,et al.  Visual SLAM and Structure from Motion in Dynamic Environments , 2018 .

[66]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[67]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Richard Sinkhorn A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices , 1964 .

[69]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Moritz Menze,et al.  Object scene flow , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[71]  Yong Jae Lee,et al.  HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Thomas Funkhouser,et al.  Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Christian Heipke,et al.  Joint 3d Estimation of Vehicles and Scene Flow , 2015 .

[74]  Kenichi Kanatani,et al.  Motion segmentation by subspace separation and model selection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[75]  Alexandre Boulch,et al.  FLOT: Scene Flow on Point Clouds Guided by Optimal Transport , 2020, ECCV.

[76]  Nassir Navab,et al.  6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference , 2020, ECCV.

[77]  Matthias Nießner,et al.  Multiframe Scene Flow with Piecewise Rigid Motion , 2017, 2017 International Conference on 3D Vision (3DV).

[78]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Jeannette Bohg,et al.  MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[80]  Silvio Savarese,et al.  SEGCloud: Semantic Segmentation of 3D Point Clouds , 2017, 2017 International Conference on 3D Vision (3DV).

[81]  Jörg Stückler,et al.  EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[82]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[83]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[84]  Ryan M. Eustice,et al.  A learning approach for real-time temporal scene flow estimation from LIDAR data , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[85]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[86]  Daniel Cremers,et al.  Efficient Dense Scene Flow from Sparse or Dense Stereo Data , 2008, ECCV.

[87]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[89]  Leonidas J. Guibas,et al.  Deep part induction from articulated object pairs , 2018, ACM Trans. Graph..

[90]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[91]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.