Tensorized Projection for High-Dimensional Binary Embedding

Embedding high-dimensional visual features (d-dimensional) to binary codes (b-dimensional) has shown advantages in various vision tasks such as object recognition and image retrieval. Meanwhile, recent works have demonstrated that to fully utilize the representation power of high-dimensional features, it is critical to encode them into long binary codes rather than short ones, i.e., b ∼ O(d) (Sánchez and Perronnin 2011). However, generating long binary codes involves large projection matrix and high-dimensional matrix-vector multiplication, thus is memory and computationally intensive. To tackle these problems, we propose Tensorized Projection (TP) to decompose the projection matrix using Tensor-Train (TT) format, which is a chain-like representation that allows to operate tensor in an efficient manner. As a result, TP can drastically reduce the computational complexity and memory cost. Moreover, by using the TT-format, TP can regulate the projection matrix against the risk of over-fitting, consequently, lead to better performance than using either dense projection matrix (like ITQ (Gong and Lazebnik 2011)) or sparse projection matrix (Xia et al. 2015). Experimental comparisons with state-of-the-art methods over various visual tasks demonstrate both the efficiency and performance advantages of our proposed TP, especially when generating high dimensional binary codes, e.g., when b ≥ d.

[1]  André Uschmajew,et al.  On Local Convergence of Alternating Schemes for Optimization of Convex Problems in the Tensor Train Format , 2013, SIAM J. Numer. Anal..

[2]  S. V. Dolgov,et al.  ALTERNATING MINIMAL ENERGY METHODS FOR LINEAR SYSTEMS IN HIGHER DIMENSIONS∗ , 2014 .

[3]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Wen Gao,et al.  To Project More or to Quantize More: Minimize Reconstruction Bias for Learning Compact Binary Codes , 2016, IJCAI.

[5]  Daniel Kressner,et al.  Preconditioned Low-Rank Methods for High-Dimensional Elliptic PDE Eigenvalue Problems , 2011, Comput. Methods Appl. Math..

[6]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[7]  Sanjiv Kumar,et al.  Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[9]  R. Courant Variational methods for the solution of problems of equilibrium and vibrations , 1943 .

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Baoxin Li,et al.  MSR-CNN: Applying motion salient region based descriptors for action recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[12]  Reinhold Schneider,et al.  The Alternating Linear Scheme for Tensor Optimization in the Tensor Train Format , 2012, SIAM J. Sci. Comput..

[13]  Junsong Yuan,et al.  Compressive Quantization for Fast Object Instance Search in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Junfeng Yang,et al.  A New Alternating Minimization Algorithm for Total Variation Image Reconstruction , 2008, SIAM J. Imaging Sci..

[15]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Shih-Fu Chang,et al.  Circulant Binary Embedding , 2014, ICML.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Shih-Fu Chang,et al.  Fast Orthogonal Projection Based on Kronecker Product , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  J. C. Gower,et al.  Projection Procrustes problems , 2004 .

[21]  Junsong Yuan,et al.  Fried Binary Embedding for High-Dimensional Visual Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Junsong Yuan,et al.  From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  David J. Schwab,et al.  Supervised Learning with Tensor Networks , 2016, NIPS.

[26]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[27]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[28]  N. Alexander,et al.  Putting MRFs on a Tensor Train , 2014 .

[29]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[32]  Jian Sun,et al.  Sparse projections for high-dimensional binary codes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[34]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[36]  Junsong Yuan,et al.  Is My Object in This Video? Reconstruction-based Object Search in Videos , 2017, IJCAI.

[37]  Junsong Yuan,et al.  Query Adaptive Instance Search using Object Sketches , 2016, ACM Multimedia.