Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding

Multi-view projection methods have demonstrated promising performance on 3D understanding tasks like 3D classification and segmentation. However, it remains unclear how to combine such multi-view methods with the widely available 3D point clouds. Previous methods use unlearned heuristics to combine features at the point level. To this end, we introduce the concept of the multi-view point cloud (Voint cloud), representing each 3D point as a set of features extracted from several view-points. This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation. Naturally, we can equip this new representation with convolutional and pooling operations. We deploy a Voint neural network (VointNet) to learn representations in the Voint space. Our novel representation achieves \sota performance on 3D classification, shape retrieval, and robust 3D part segmentation on standard benchmarks ( ScanObjectNN, ShapeNet Core55, and ShapeNet Parts).

[1]  Hei Law,et al.  Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline , 2021, ICML.

[2]  Weidong Cai,et al.  Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Chi-Wing Fu,et al.  3D-to-2D Distillation for Indoor Scene Parsing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Zheng Zhang,et al.  Group-Free 3D Object Detection via Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jiaya Jia,et al.  Bidirectional Projection Network for Cross Dimension Scene Understanding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Ralph R. Martin,et al.  PCT: Point cloud transformer , 2020, Computational Visual Media.

[8]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ali K. Thabet,et al.  MVTN: Multi-View Transformation Network for 3D Shape Recognition , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[11]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Thomas Funkhouser,et al.  Virtual Multi-view Fusion for 3D Semantic Segmentation , 2020, ECCV.

[13]  Jian Sun,et al.  View-GCN: View-Based Graph Convolutional Network for 3D Shape Analysis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Zehao Yu,et al.  Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[16]  Ziming Zhang,et al.  Learning to Segment 3D Point Clouds in 2D Image Space , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ali K. Thabet,et al.  AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds , 2019, ECCV.

[18]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[19]  Geoffrey E. Hinton,et al.  CvxNet: Learnable Convex Decomposition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yin Zhou,et al.  End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds , 2019, CoRL.

[21]  Hao Su,et al.  Multi-View PointNet for 3D Scene Understanding , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[22]  Duc Thanh Nguyen,et al.  Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Jing Xu,et al.  Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Matthias Zwicker,et al.  Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Yue Gao,et al.  MLVCNN: Multi-Loop-View Convolutional Neural Network for 3D Shape Retrieval , 2019, AAAI.

[26]  Song Han,et al.  Point-Voxel CNN for Efficient 3D Deep Learning , 2019, NeurIPS.

[27]  Kilian Q. Weinberger,et al.  LDLS: 3-D Object Segmentation Through Label Diffusion From 2-D Images , 2019, IEEE Robotics and Automation Letters.

[28]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Mohamed Boussaha,et al.  Point Cloud Oversegmentation With Graph-Structured Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kostas Daniilidis,et al.  Equivariant Multi-View Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Matthias Nießner,et al.  3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yue Gao,et al.  MeshNet: Mesh Neural Network for 3D Shape Representation , 2018, AAAI.

[39]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[40]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[41]  R. Giryes,et al.  MeshCNN: a network with an edge , 2019, ACM Trans. Graph..

[42]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[43]  Yue Gao,et al.  PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition , 2018, ACM Multimedia.

[44]  Yue Gao,et al.  GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[46]  Narendra Ahuja,et al.  DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Yifan Xu,et al.  SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[48]  Matthias Nießner,et al.  3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation , 2018, ECCV.

[49]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[51]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[56]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[57]  Subhransu Maji,et al.  3D Shape Segmentation with Projective Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[61]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[62]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[65]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[66]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[67]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[69]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[71]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[72]  J. Ponce,et al.  Accurate, Dense, and Robust Multi-View Stereopsis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Leonard McMillan,et al.  Plenoptic Modeling: An Image-Based Rendering System , 2023 .

[74]  Stephen Grossberg,et al.  Recognition of 3-D Objects from Multiple 2-D Views by a Self-Organizing Neural Architecture , 1994 .

[75]  Takeo Kanade,et al.  A multiple-baseline stereo , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.