论文信息 - Tensorized Projection for High-Dimensional Binary Embedding

Tensorized Projection for High-Dimensional Binary Embedding

Embedding high-dimensional visual features (d-dimensional) to binary codes (b-dimensional) has shown advantages in various vision tasks such as object recognition and image retrieval. Meanwhile, recent works have demonstrated that to fully utilize the representation power of high-dimensional features, it is critical to encode them into long binary codes rather than short ones, i.e., b ∼ O(d) (Sánchez and Perronnin 2011). However, generating long binary codes involves large projection matrix and high-dimensional matrix-vector multiplication, thus is memory and computationally intensive. To tackle these problems, we propose Tensorized Projection (TP) to decompose the projection matrix using Tensor-Train (TT) format, which is a chain-like representation that allows to operate tensor in an efficient manner. As a result, TP can drastically reduce the computational complexity and memory cost. Moreover, by using the TT-format, TP can regulate the projection matrix against the risk of over-fitting, consequently, lead to better performance than using either dense projection matrix (like ITQ (Gong and Lazebnik 2011)) or sparse projection matrix (Xia et al. 2015). Experimental comparisons with state-of-the-art methods over various visual tasks demonstrate both the efficiency and performance advantages of our proposed TP, especially when generating high dimensional binary codes, e.g., when b ≥ d.

[1] André Uschmajew,et al. On Local Convergence of Alternating Schemes for Optimization of Convex Problems in the Tensor Train Format , 2013, SIAM J. Numer. Anal..

[2] S. V. Dolgov,et al. ALTERNATING MINIMAL ENERGY METHODS FOR LINEAR SYSTEMS IN HIGHER DIMENSIONS∗ , 2014 .

[3] Le Song,et al. Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Wen Gao,et al. To Project More or to Quantize More: Minimize Reconstruction Bias for Learning Compact Binary Codes , 2016, IJCAI.

[5] Daniel Kressner,et al. Preconditioned Low-Rank Methods for High-Dimensional Elliptic PDE Eigenvalue Problems , 2011, Comput. Methods Appl. Math..

[6] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[7] Sanjiv Kumar,et al. Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Ivan Oseledets,et al. Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[9] R. Courant. Variational methods for the solution of problems of equilibrium and vibrations , 1943 .

[10] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11] Baoxin Li,et al. MSR-CNN: Applying motion salient region based descriptors for action recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[12] Reinhold Schneider,et al. The Alternating Linear Scheme for Tensor Optimization in the Tensor Train Format , 2012, SIAM J. Sci. Comput..

[13] Junsong Yuan,et al. Compressive Quantization for Fast Object Instance Search in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Junfeng Yang,et al. A New Alternating Minimization Algorithm for Total Variation Image Reconstruction , 2008, SIAM J. Imaging Sci..

[15] Nicu Sebe,et al. A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17] Shih-Fu Chang,et al. Circulant Binary Embedding , 2014, ICML.

[18] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19] Shih-Fu Chang,et al. Fast Orthogonal Projection Based on Kronecker Product , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] J. C. Gower,et al. Projection Procrustes problems , 2004 .

[21] Junsong Yuan,et al. Fried Binary Embedding for High-Dimensional Visual Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Junsong Yuan,et al. From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] David J. Schwab,et al. Supervised Learning with Tensor Networks , 2016, NIPS.

[26] Alexander Novikov,et al. Tensorizing Neural Networks , 2015, NIPS.

[27] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[28] N. Alexander,et al. Putting MRFs on a Tensor Train , 2014 .

[29] Alexander J. Smola,et al. Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[30] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[31] Mark J. Huiskes,et al. The MIR flickr retrieval evaluation , 2008, MIR '08.

[32] Jian Sun,et al. Sparse projections for high-dimensional binary codes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Trevor Darrell,et al. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[34] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Florent Perronnin,et al. High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[36] Junsong Yuan,et al. Is My Object in This Video? Reconstruction-based Object Search in Videos , 2017, IJCAI.

[37] Junsong Yuan,et al. Query Adaptive Instance Search using Object Sketches , 2016, ACM Multimedia.