CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations

We propose CaSPR, a method to learn object-centric canonical spatiotemporal point cloud representations of dynamically moving or evolving objects. Our goal is to enable information aggregation over time and the interrogation of object state at any spatiotemporal neighborhood in the past, observed or not. Different from previous work, CaSPR learns representations that support spacetime continuity, are robust to variable and irregularly spacetime-sampled point clouds, and generalize to unseen object instances. Our approach divides the problem into two subtasks. First, we explicitly encode time by mapping an input point cloud sequence to a spatiotemporally-canonicalized object space. We then leverage this canonicalization to learn a spatiotemporal latent representation using neural ordinary differential equations and a generative model of dynamically evolving shapes using continuous normalizing flows. We demonstrate the effectiveness of our method on several applications including shape reconstruction, camera pose estimation, continuous spatiotemporal sequence reconstruction, and correspondence estimation from irregularly or intermittently sampled observations.

[1]  Christopher De Sa,et al.  Neural Manifold Ordinary Differential Equations , 2020, NeurIPS.

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[5]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[6]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Zi Jian Yew,et al.  RPM-Net: Robust Point Matching Using Learned Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Han Zhang,et al.  Approximation Capabilities of Neural Ordinary Differential Equations , 2019, ArXiv.

[9]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[10]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Francesc Moreno-Noguer,et al.  C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[13]  Ivan Kobyzev,et al.  Normalizing Flows: Introduction and Ideas , 2019, ArXiv.

[14]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Emanuele Menegatti,et al.  Quaternion Equivariant Capsule Networks for 3D Point Clouds , 2019, ECCV.

[16]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Weilin Huang,et al.  V4D: 4D Convolutional Neural Networks for Video-level Representation Learning , 2020, ICLR.

[18]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[19]  Adrien Gaidon,et al.  Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Marco Fiore,et al.  CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting , 2019, AAAI.

[21]  Jeannette Bohg,et al.  MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Federico Tombari,et al.  3D Point Capsule Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[25]  Srinath Sridhar,et al.  Continuous Geodesic Convolutions for Learning on 3D Shapes , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Lawrence Carin,et al.  Continuous-Time Flows for Efficient Inference and Density Estimation , 2017, ICML.

[27]  A. Lynn Abbott,et al.  Category-Level Articulated Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Adam M. Oberman,et al.  How to train your neural ODE , 2020, ICML.

[29]  Dmitry Vetrov,et al.  Semi-Conditional Normalizing Flows for Semi-Supervised Learning , 2019, ArXiv.

[30]  Stefano Ermon,et al.  Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models , 2017, AAAI.

[31]  Xingjian Li,et al.  OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport , 2020, ArXiv.

[32]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Slobodan Ilic,et al.  PPFNet: Global Context Aware Local Features for Robust 3D Point Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Leonidas J. Guibas,et al.  Dynamic geometry registration , 2007, Symposium on Geometry Processing.

[35]  Joel L. Lebowitz,et al.  Boltzmann's Entropy and Time's Arrow , 1993 .

[36]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[38]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[39]  E. Tabak,et al.  A Family of Nonparametric Density Estimation Algorithms , 2013 .

[40]  C. Runge Ueber die numerische Auflösung von Differentialgleichungen , 1895 .

[41]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[42]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Andreas Geiger,et al.  Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Li Liu,et al.  Deep Learning for 3D Point Clouds: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[45]  Nematollah Batmanghelich,et al.  Deep Diffeomorphic Normalizing Flows , 2018, ArXiv.

[46]  Jun Li,et al.  Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Radu Horaud,et al.  Temporal Surface Tracking Using Mesh Evolution , 2008, ECCV.

[48]  Victor Adrian Prisacariu,et al.  FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[49]  Noah Snavely,et al.  DualSDF: Semantic Shape Manipulation Using a Two-Level Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Danilo Jimenez Rezende,et al.  Equivariant Hamiltonian Flows , 2019, ArXiv.

[53]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[54]  Honglak Lee,et al.  Time Dependence in Non-Autonomous Neural ODEs , 2020, ICLR 2020.

[55]  Leonidas J. Guibas,et al.  Multiview Aggregation for Learning Category-Specific Shape Reconstruction , 2019, NeurIPS.

[56]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[57]  Stephan Gunnemann,et al.  Equivariant Normalizing Flows for Point Processes and Sets , 2020 .

[58]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Kristian Kirsch,et al.  Theory Of Ordinary Differential Equations , 2016 .

[60]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Aseem Behl,et al.  PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Ryan P. Adams,et al.  High-Dimensional Probability Estimation with Deep Density Models , 2013, ArXiv.

[64]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Yong Jae Lee,et al.  HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Dong Tian,et al.  Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[70]  Bernhard Schölkopf,et al.  Seeing the Arrow of Time , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[72]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73]  Stefan Jeschke,et al.  Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds , 2019, ICLR.

[74]  Arnaud Doucet,et al.  Localised Generative Flows , 2019, ArXiv.

[75]  Lourdes Agapito,et al.  MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[76]  Vincent Y. F. Tan,et al.  On Robustness of Neural Ordinary Differential Equations , 2020, ICLR.

[77]  Leonidas J. Guibas,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  D. Bertsekas A distributed asynchronous relaxation algorithm for the assignment problem , 1985, 1985 24th IEEE Conference on Decision and Control.

[79]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[80]  Frank Noé,et al.  Equivariant Flows: sampling configurations for multi-body systems with symmetric energies , 2019, ArXiv.

[81]  Aviral Kumar,et al.  Graph Normalizing Flows , 2019, NeurIPS.

[82]  Leigh Page,et al.  The Nature of the Physical World , 1929 .

[83]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[84]  Sterling C. Johnson,et al.  DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[85]  S. Klinke,et al.  Exploratory Projection Pursuit , 1995 .

[86]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[88]  Slobodan Ilic,et al.  PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors , 2018, ECCV.

[89]  E Weinan,et al.  Monge-Ampère Flow for Generative Modeling , 2018, ArXiv.

[90]  Shakir Mohamed,et al.  Normalizing Flows on Riemannian Manifolds , 2016, ArXiv.