MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization

We present MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds. The two non-trivial challenges posed by this multi-scan multibody setting that we investigate are: (i) guaranteeing correspondence and segmentation consistency across multiple input point clouds capturing different spatial arrangements of bodies or body parts; and (ii) obtaining robust motionbased rigid body segmentation applicable to novel object categories.We propose an approach to address these issues that incorporates spectral synchronization into an iterative deep declarative network, so as to simultaneously recover consistent correspondences as well as motion segmentation. At the same time, by explicitly disentangling the correspondence and motion segmentation estimation modules, we achieve strong generalizability across different object categories. Our extensive evaluations demonstrate that our method is effective on various datasets ranging from rigid parts in articulated objects to individually moving objects in a 3D scene, be it single-view or full point clouds. Code at https://github.com/ huangjh-pub/multibody-sync.

[1]  Leonidas J. Guibas,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hongdong Li,et al.  Rotation Averaging , 2013, International Journal of Computer Vision.

[3]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[4]  Leonidas J. Guibas,et al.  Learning Transformation Synchronization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[6]  Venu Madhav Govindu,et al.  Robust Relative Rotation Averaging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Leonidas J. Guibas,et al.  CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations , 2020, NeurIPS.

[8]  Amit Singer,et al.  Exact and Stable Recovery of Rotations for Robust Synchronization , 2012, ArXiv.

[9]  Trevor Darrell,et al.  Joint Monocular 3D Vehicle Detection and Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Aseem Behl,et al.  PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[13]  Gui-Song Xia,et al.  Globally consistent correspondence of multiple feature sets using proximal Gauss-Seidel relaxation , 2016, Pattern Recognit..

[14]  Matthias Nießner,et al.  RIO: 3D Object Instance Re-Localization in Changing Indoor Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Venu Madhav Govindu,et al.  Lie-algebraic averaging for globally consistent motion estimation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Javier González,et al.  Cartan-Sync: Fast and Global SE(d)-Synchronization , 2017, IEEE Robotics and Automation Letters.

[17]  Xiaowei Zhou,et al.  Multi-image Semantic Matching by Mining Consistent Features , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Xun Xu,et al.  3D Rigid Motion Segmentation with Mixed and Unknown Number of Models , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Li Jiang,et al.  PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Leonidas J. Guibas,et al.  Learning Multiview 3D Point Cloud Registration , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrea Fusiello,et al.  Fitting Multiple Heterogeneous Models by Multi-Class Cascaded T-Linkage , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Rui Hu,et al.  Deep Rigid Instance Scene Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Zhuwen Li,et al.  PointPWC-Net: A Coarse-to-Fine Network for Supervised and Self-Supervised Scene Flow Estimation on 3D Point Clouds , 2019, ArXiv.

[25]  Andreas Geiger,et al.  Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Nan Hu,et al.  Distributable Consistent Multi-object Matching , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Dimitrios Tzionas,et al.  Reconstructing Articulated Rigged Models from RGB-D Videos , 2016, ECCV Workshops.

[29]  Kostas Daniilidis,et al.  Statistical Pose Averaging with Non-isotropic and Incomplete Relative Measurements , 2014, ECCV.

[30]  Leonidas J. Guibas,et al.  Synchronizing Probability Measures on Rotations via Optimal Transport , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  John W. Fisher,et al.  Nonparametric Object and Parts Modeling With Lie Group Dynamics , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Frank Dellaert,et al.  Initialization techniques for 3D SLAM: A survey on rotation estimation and its use in pose graph optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Stephen Gould,et al.  Deep Declarative Networks: A New Hope , 2019, ArXiv.

[34]  Andrea Fusiello,et al.  Synchronization in the Symmetric Inverse Semigroup , 2017, ICIAP.

[35]  René Vidal,et al.  Projective Factorization of Multiple Rigid-Body Motions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Chandrajit Bajaj,et al.  Tensor maps for synchronizing heterogeneous shape collections , 2019, ACM Trans. Graph..

[37]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[39]  Andrea Fusiello,et al.  Synchronization Problems in Computer Vision with Closed-Form Solutions , 2019, International Journal of Computer Vision.

[40]  Slobodan Ilic,et al.  Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC , 2018, NeurIPS.

[41]  Yasuyuki Matsushita,et al.  GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Vincent Lepetit,et al.  Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Leonidas J. Guibas,et al.  Deep part induction from articulated object pairs , 2018, ACM Trans. Graph..

[44]  Venu Madhav Govindu,et al.  On Averaging Multiview Relations for 3D Scan Registration , 2014, IEEE Transactions on Image Processing.

[45]  Johan Thunberg,et al.  A solution for multi-alignment by transformation synchronisation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Andrea Fusiello,et al.  Spectral Synchronization of Multiple Views in SE(3) , 2016, SIAM J. Imaging Sci..

[47]  Tai-Jiang Mu,et al.  ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Long Quan,et al.  Learning Two-View Correspondences and Geometry Using Order-Aware Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Fei Wang,et al.  Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses , 2018, ECCV.

[51]  Kai Xu,et al.  Rescan: Inductive Instance Segmentation for Indoor RGBD Scans , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[53]  Slobodan Ilic,et al.  CAD Priors for Accurate and Flexible Instance Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Tat-Jun Chin,et al.  NeuRoRA: Neural Robust Rotation Averaging , 2020, ECCV.

[55]  Victor Adrian Prisacariu,et al.  FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[56]  Umut Simsekli,et al.  Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[59]  Amit Singer,et al.  Global Registration of Multiple Point Clouds Using Semidefinite Programming , 2013, SIAM J. Optim..

[60]  John J. Leonard,et al.  SE-Sync: A certifiably correct algorithm for synchronization over the special Euclidean group , 2016, Int. J. Robotics Res..

[61]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[62]  Johan Thunberg,et al.  Distributed methods for synchronization of orthogonal matrices over graphs , 2017, Autom..

[63]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Eric Brachmann,et al.  CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Vikas Singh,et al.  Solving the multi-way matching problem by permutation synchronization , 2013, NIPS.

[66]  Dimitris N. Metaxas,et al.  MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Alexandre Boulch,et al.  FLOT: Scene Flow on Point Clouds Guided by Optimal Transport , 2020, ECCV.

[68]  Robert Mahony,et al.  VDO-SLAM: A Visual Dynamic Object-aware SLAM System , 2020, ArXiv.

[69]  Berta Bescos,et al.  DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM , 2020, IEEE Robotics and Automation Letters.

[70]  Andrea Torsello,et al.  Synchronization Over the Birkhoff Polytope for Multi-graph Matching , 2017, GbRPR.

[71]  Marc Pollefeys,et al.  Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion , 2020, 2020 International Conference on 3D Vision (3DV).

[72]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Yuri Boykov,et al.  Energy-Based Geometric Multi-model Fitting , 2012, International Journal of Computer Vision.

[74]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[77]  René Vidal,et al.  Distributed 3-D Localization of Camera Sensor Networks From 2-D Image Measurements , 2014, IEEE Transactions on Automatic Control.

[78]  Tomas Pajdla,et al.  Motion Segmentation via Synchronization , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[79]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[80]  Andrea Fusiello,et al.  Practical and Efficient Multi-view Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[81]  Jeannette Bohg,et al.  MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[82]  Binbin Xu,et al.  MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[83]  A. Lynn Abbott,et al.  Category-Level Articulated Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).