Iterative Transformer Network for 3D Point Cloud

3D point cloud is an efficient and flexible representation of 3D structures. Recently, neural networks operating on point clouds have shown superior performance on 3D understanding tasks such as shape classification and part segmentation. However, performance on such tasks is evaluated on complete shapes aligned in a canonical frame, while real world 3D data are partial and unaligned. A key challenge in learning from partial, unaligned point cloud data is to learn features that are invariant or equivariant with respect to geometric transformations. To address this challenge, we propose the Iterative Transformer Network (IT-Net), a network module that canonicalizes the pose of a partial object with a series of 3D rigid transformations predicted in an iterative fashion. We demonstrate the efficacy of IT-Net as an anytime pose estimator from partial point clouds without using complete object models. Further, we show that IT-Net achieves superior performance over alternative 3D transformer networks on various tasks, such as partial shape classification and object part segmentation.

[1]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[2]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[4]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[5]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Simon Lucey,et al.  Inverse Compositional Spatial Transformer Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Martial Hebert,et al.  PCN: Point Completion Network , 2018, 2018 International Conference on 3D Vision (3DV).

[8]  Lars Petersson,et al.  GOGMA: Globally-Optimal Gaussian Mixture Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[10]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[11]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[12]  Jiaolong Yang,et al.  Go-ICP: Solving 3D Registration Efficiently and Globally Optimally , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Thomas Brox,et al.  Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[14]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[16]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[17]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[18]  Baoquan Chen,et al.  PointCNN , 2018, NIPS 2018.

[19]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Daniel Cremers,et al.  The wave kernel signature: A quantum mechanical approach to shape analysis , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[24]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[30]  Lin Sun,et al.  Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Baoquan Chen,et al.  PointCNN: Convolution On $\mathcal{X}$-Transformed Points , 2018 .

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).