论文信息 - Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention

Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention

Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.

Matthias Zwicker | Xinhai Liu | Zhizhong Han | Yu-Shen Liu

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Kaleem Siddiqi,et al. Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[3] Thomas A. Funkhouser,et al. A benchmark for 3D mesh segmentation , 2009, ACM Trans. Graph..

[4] Alexander M. Bronstein,et al. Numerical Geometry of Non-Rigid Shapes , 2009, Monographs in Computer Science.

[5] Zhiqiang Shen,et al. Multiple Granularity Descriptors for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] Junwei Han,et al. Mesh Convolutional Restricted Boltzmann Machines for Unsupervised Learning of Features With Structure Preservation on 3-D Meshes , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[7] Michael J. Black,et al. FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Matthias Zwicker,et al. View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions , 2018, AAAI.

[9] Matthias Zwicker,et al. Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Ryan Farrell,et al. Aligned to the Object, Not to the Image: A Unified Pose-Aligned Representation for Fine-Grained Recognition , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15] Liang Lin,et al. Multi-label Image Recognition by Recurrently Discovering Attentional Regions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[17] Ioannis Pratikakis,et al. Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[18] Yasuyuki Matsushita,et al. RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Marc Alexa,et al. ABC: A Big CAD Model Dataset for Geometric Deep Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[22] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[23] Zhizhong Han,et al. CF-SIS: Semantic-Instance Segmentation of 3D Point Clouds by Context Fusion with Self-Attention , 2020, ACM Multimedia.

[24] Oliver Grau,et al. VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[25] Xuelong Li,et al. Unsupervised 3D Local Feature Learning by Circle Convolutional Restricted Boltzmann Machine , 2016, IEEE Transactions on Image Processing.

[26] Leonidas J. Guibas,et al. PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28] Matthias Zwicker,et al. Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network , 2018, AAAI.

[29] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Marcel Simon,et al. Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Fei-Fei Li,et al. Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[32] Leonidas J. Guibas,et al. Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Xinhai Liu,et al. Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds Using Spatial-Aware Capsules , 2019, IEEE Transactions on Image Processing.

[34] Leonidas J. Guibas,et al. Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[35] Matthias Zwicker,et al. DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images , 2020, ICML.

[36] Daniel Cohen-Or,et al. Active co-analysis of a set of shapes , 2012, ACM Trans. Graph..

[37] Jianfei Cai,et al. Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation , 2016, IEEE Transactions on Image Processing.

[38] Pierre Sermanet,et al. Attention for Fine-Grained Categorization , 2014, ICLR.

[39] Leonidas J. Guibas,et al. A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[40] Stefan Leutenegger,et al. Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Junwei Han,et al. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.

[42] Ligang Liu,et al. Co‐Segmentation of 3D Shapes via Subspace Clustering , 2012, Comput. Graph. Forum.

[43] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[44] Qi Tian,et al. GIFT: Towards Scalable 3D Shape Retrieval , 2017, IEEE Transactions on Multimedia.

[45] Thomas A. Funkhouser,et al. The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[46] Yizhou Yu,et al. Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Guillaume Lavoué,et al. A framework for the objective evaluation of segmentation algorithms using a ground-truth of human segmented 3D-models , 2009, 2009 IEEE International Conference on Shape Modeling and Applications.

[48] Zhichao Zhou,et al. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[49] Matthias Zwicker,et al. Reconstructing 3D Shapes From Multiple Sketches Using Direct Shape Optimization , 2020, IEEE Transactions on Image Processing.

[50] Jian Sun,et al. View-GCN: View-Based Graph Convolutional Network for 3D Shape Analysis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Matthias Zwicker,et al. Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences , 2018, AAAI.

[52] Matthias Zwicker,et al. 3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention , 2019, IJCAI.

[53] Dong Tian,et al. FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[56] Matthias Zwicker,et al. Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views , 2019, IJCAI.

[57] R. Wood,et al. Meshworm: A Peristaltic Soft Robot With Antagonistic Nickel Titanium Coil Actuators , 2013, IEEE/ASME Transactions on Mechatronics.

[58] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[59] Shiming Xiang,et al. Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Sebastian Scherer,et al. VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[61] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[62] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[63] Matthias Zwicker,et al. SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates , 2020, ECCV.

[64] Subhransu Maji,et al. Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65] Aaron Hertzmann,et al. Learning 3D mesh segmentation and labeling , 2010, ACM Trans. Graph..

[66] Vladimir G. Kim,et al. Tags2Parts: Discovering Semantic Regions from Shape Tags , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67] Jiaxin Li,et al. SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68] Junwei Han,et al. 3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation , 2019, IEEE Transactions on Image Processing.

[69] Tao Mei,et al. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[70] Matthias Zwicker,et al. ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences , 2019, ACM Multimedia.

[71] Peter K. Allen,et al. Autotagging to improve text search for 3d models , 2008, Shape Modeling International.

[72] Subhransu Maji,et al. 3D Shape Segmentation with Projective Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[74] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[75] Yu-Shen Liu,et al. Point Cloud Completion by Skip-Attention Network With Hierarchical Folding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).