论文信息 - 3D Object Representation Learning: A Set-to-Set Matching Perspective

3D Object Representation Learning: A Set-to-Set Matching Perspective

In this paper, we tackle the 3D object representation learning from the perspective of set-to-set matching. Given two 3D objects, calculating their similarity is formulated as the problem of set-to-set similarity measurement between two set of local patches. As local convolutional features from convolutional feature maps are natural representations of local patches, the set-to-set matching between sets of local patches is further converted into a local features pooling problem. To highlight good matchings and suppress the bad ones, we exploit two pooling methods: 1) bilinear pooling and 2) VLAD pooling. We analyze their effectiveness in enhancing the set-to-set matching and meanwhile establish their connection. Moreover, to balance different components inherent in a bilinear-pooled feature, we propose the harmonized bilinear pooling operation, which follows the spirits of intra-normalization used in VLAD pooling. To achieve an end-to-end trainable framework, we implement the proposed harmonized bilinear pooling and intra-normalized VLAD as two layers to construct two types of neural network, multi-view harmonized bilinear network (MHBN) and multi-view VLAD network (MVLADN). Systematic experiments conducted on two public benchmark datasets demonstrate the efficacy of the proposed MHBN and MVLADN in 3D object recognition.

[1] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Stefan Leutenegger,et al. Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Thomas Brox,et al. Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[5] Leonidas J. Guibas,et al. Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Cristian Sminchisescu,et al. Training Deep Networks with Structured Layers by Matrix Backpropagation , 2015, ArXiv.

[7] Subhransu Maji,et al. SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Victor S. Lempitsky,et al. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Qi Tian,et al. GIFT: Towards Scalable 3D Shape Retrieval , 2017, IEEE Transactions on Multimedia.

[10] Zhichao Zhou,et al. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[11] Erik G. Larsson,et al. Lecture Notes: Floating-Point Numbers , 2010 .

[12] Cristian Sminchisescu,et al. Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Yang Gao,et al. Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Subhransu Maji,et al. Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Anoop Cherian,et al. Sparse Coding for Third-Order Super-Symmetric Tensor Descriptors with Application to Texture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Xiao Liu,et al. Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Song-Chun Zhu,et al. Learning Descriptor Networks for 3D Shape Synthesis and Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Ping Li,et al. Fast and Compact Bilinear Pooling by Shifted Random Maclaurin , 2021, AAAI.

[20] Ye Duan,et al. PointGrid: A Deep Network for 3D Shape Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Theodore Lim,et al. Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[22] Sebastian Scherer,et al. VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[24] Shiguang Shan,et al. Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[25] Shu Kong,et al. Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Longin Jan Latecki,et al. GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Subhransu Maji,et al. Improved Bilinear Pooling with CNNs , 2017, BMVC.

[28] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Didier Stricker,et al. Learning 3D Shapes as Multi-Layered Height-maps using 2D Convolutional Networks , 2018, ECCV.

[31] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Anoop Cherian,et al. Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[35] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Alexander J. Smola,et al. Deep Sets , 2017, 1703.06114.

[37] Yue Gao,et al. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Kaleem Siddiqi,et al. Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[39] Junsong Yuan,et al. Multi-view Harmonized Bilinear Network for 3D Object Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[41] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42] Reza Bosagh Zadeh,et al. FusionNet: 3D Object Classification Using Multiple Data Representations , 2016, ArXiv.

[43] R. Sakia. The Box-Cox transformation technique: a review , 1992 .

[44] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[45] Leonidas J. Guibas,et al. FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[46] Cristian Sminchisescu,et al. Semantic Segmentation with Second-Order Pooling , 2012, ECCV.