3D Object Representation Learning: A Set-to-Set Matching Perspective

In this paper, we tackle the 3D object representation learning from the perspective of set-to-set matching. Given two 3D objects, calculating their similarity is formulated as the problem of set-to-set similarity measurement between two set of local patches. As local convolutional features from convolutional feature maps are natural representations of local patches, the set-to-set matching between sets of local patches is further converted into a local features pooling problem. To highlight good matchings and suppress the bad ones, we exploit two pooling methods: 1) bilinear pooling and 2) VLAD pooling. We analyze their effectiveness in enhancing the set-to-set matching and meanwhile establish their connection. Moreover, to balance different components inherent in a bilinear-pooled feature, we propose the harmonized bilinear pooling operation, which follows the spirits of intra-normalization used in VLAD pooling. To achieve an end-to-end trainable framework, we implement the proposed harmonized bilinear pooling and intra-normalized VLAD as two layers to construct two types of neural network, multi-view harmonized bilinear network (MHBN) and multi-view VLAD network (MVLADN). Systematic experiments conducted on two public benchmark datasets demonstrate the efficacy of the proposed MHBN and MVLADN in 3D object recognition.

[1]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Thomas Brox,et al.  Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[5]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Cristian Sminchisescu,et al.  Training Deep Networks with Structured Layers by Matrix Backpropagation , 2015, ArXiv.

[7]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Qi Tian,et al.  GIFT: Towards Scalable 3D Shape Retrieval , 2017, IEEE Transactions on Multimedia.

[10]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[11]  Erik G. Larsson,et al.  Lecture Notes: Floating-Point Numbers , 2010 .

[12]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Anoop Cherian,et al.  Sparse Coding for Third-Order Super-Symmetric Tensor Descriptors with Application to Texture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Song-Chun Zhu,et al.  Learning Descriptor Networks for 3D Shape Synthesis and Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Ping Li,et al.  Fast and Compact Bilinear Pooling by Shifted Random Maclaurin , 2021, AAAI.

[20]  Ye Duan,et al.  PointGrid: A Deep Network for 3D Shape Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[22]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[24]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[25]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[28]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Didier Stricker,et al.  Learning 3D Shapes as Multi-Layered Height-maps using 2D Convolutional Networks , 2018, ECCV.

[31]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Anoop Cherian,et al.  Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[35]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[37]  Yue Gao,et al.  GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Kaleem Siddiqi,et al.  Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[39]  Junsong Yuan,et al.  Multi-view Harmonized Bilinear Network for 3D Object Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[41]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Reza Bosagh Zadeh,et al.  FusionNet: 3D Object Classification Using Multiple Data Representations , 2016, ArXiv.

[43]  R. Sakia The Box-Cox transformation technique: a review , 1992 .

[44]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[45]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[46]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.