Multi-view Harmonized Bilinear Network for 3D Object Recognition

View-based methods have achieved considerable success in 3D object recognition tasks. Different from existing view-based methods pooling the view-wise features, we tackle this problem from the perspective of patches-to-patches similarity measurement. By exploiting the relationship between polynomial kernel and bilinear pooling, we obtain an effective 3D object representation by aggregating local convolutional features through bilinear pooling. Meanwhile, we harmonize different components inherited in the bilinear feature to obtain a more discriminative representation. To achieve an end-to-end trainable framework, we incorporate the harmonized bilinear pooling as a layer of a network, constituting the proposed Multi-view Harmonized Bilinear Network (MHBN). Systematic experiments conducted on two public benchmark datasets demonstrate the efficacy of the proposed methods in 3D object recognition.

[1]  R. Sakia The Box-Cox transformation technique: a review , 1992 .

[2]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[3]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Amnon Shashua,et al.  Introduction to Machine Learning: Class Notes 67577 , 2009, ArXiv.

[5]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[6]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[7]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[8]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[9]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[12]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Cristian Sminchisescu,et al.  Training Deep Networks with Structured Layers by Matrix Backpropagation , 2015, ArXiv.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[24]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[25]  Reza Bosagh Zadeh,et al.  FusionNet: 3D Object Classification Using Multiple Data Representations , 2016, ArXiv.

[26]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[27]  Thomas Brox,et al.  Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[28]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[30]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kaleem Siddiqi,et al.  Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[33]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).