General-Purpose Deep Point Cloud Feature Extractor

Depth sensors used in autonomous driving and gaming systems often report back 3D point clouds. The lack of structure from these sensors does not allow these systems to take advantage of recent advances in convolutional neural networks which are dependent upon traditional filtering and pooling operations. Analogous to image based convolutional architectures, recently introduced graph based architectures afford similar filtering and pooling operations on arbitrary graphs. We adopt these graph based methods to 3D point clouds to introduce a generic vector representation of 3D graphs, we call graph 3D (G3D). We believe we are the first to use large scale transfer learning on 3D point cloud data and demonstrate the discriminant power of our salient latent representation of 3D point clouds on unforeseen test sets. By using our G3D network (G3DNet) as a feature extractor, and then pairing G3D feature vectors with a standard classifier, we achieve the best accuracy on ModelNet10 (93.1%) and ModelNet 40 (91.7%) for a graph network, and comparable performance on the Sydney Urban Objects dataset to other methods. This general-purpose feature extractor can be used as an off-the-shelf component in other 3D scene understanding or object tracking works.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Nathan D. Cahill,et al.  Robust Spatial Filtering With Graph Convolutional Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[3]  Jitendra Malik,et al.  Generic 3D Representation via Pose Estimation and Matching , 2016, ECCV.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Leonidas J. Guibas,et al.  SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[7]  Kaleem Siddiqi,et al.  Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[8]  Xu Xu,et al.  Beam search for learning a deep Convolutional Neural Network of 3D shapes , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[9]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Xinlei Chen,et al.  Learning a Recurrent Visual Representation for Image Caption Generation , 2014, ArXiv.

[11]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiang Li,et al.  LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition , 2017, 3DOR@Eurographics.

[13]  Bin Dai,et al.  Performance of global descriptors for velodyne-based urban object recognition , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[14]  Shagan Sah,et al.  Towards 3D convolutional neural networks with meshes , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[15]  Ludovico Minto,et al.  Deep learning for 3D shape classification from multiple depth maps , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[16]  Karthik Ramani,et al.  Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[17]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[18]  Thomas Brox,et al.  Unsupervised Generation of a View Point Annotated Car Dataset from Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[21]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[24]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[25]  Thomas Brox,et al.  Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[26]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.

[27]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[28]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[30]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[32]  Luc Van Gool,et al.  Learning Where to Classify in Multi-view Semantic Segmentation , 2014, ECCV.

[33]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[34]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Marcus A. Badgeley,et al.  Wide and deep volumetric residual networks for volumetric image classification , 2017, ArXiv.

[36]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  José García Rodríguez,et al.  LonchaNet: A sliced-based CNN architecture for real-time 3D object recognition , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Geun-Sik Jo,et al.  Unsupervised feature learning for classification , 2016 .

[40]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[41]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Xiang Li,et al.  Toward real-time 3D object recognition: A lightweight volumetric CNN framework using multitask learning , 2017, Comput. Graph..

[44]  Luke N. Olson,et al.  Algebraic Multigrid for Discrete Differential Forms , 2008 .

[45]  Jiajun Wu,et al.  Synthesizing 3D Shapes via Modeling Multi-view Depth Maps and Silhouettes with Deep Generative Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Nikos Komodakis,et al.  Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  François Goulette,et al.  Paris-rue-Madame Database - A 3D Mobile Laser Scanner Dataset for Benchmarking Urban Detection, Segmentation and Classification Methods , 2014, ICPRAM.

[50]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[51]  Anath Fischer,et al.  3D Point Cloud Classification and Segmentation using 3D Modified Fisher Vector Representation for Convolutional Neural Networks , 2017, ArXiv.

[52]  Barnabás Póczos,et al.  Deep Learning with Sets and Point Clouds , 2016, ICLR.

[53]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[54]  Paolo Cignoni,et al.  MeshLab: an Open-Source 3D Mesh Processing System , 2008, ERCIM News.

[55]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[56]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).