Structured Learning of Binary Codes with Column Generation for Optimizing Ranking Measures

Hashing methods aim to learn a set of hash functions which map the original features to compact binary codes with similarity preserving in the Hamming space. Hashing has proven a valuable tool for large-scale information retrieval. We propose a column generation based binary code learning framework for data-dependent hash function learning. Given a set of triplets that encode the pairwise similarity comparison information, our column generation based method learns hash functions that preserve the relative comparison relations within the large-margin learning framework. Our method iteratively learns the best hash functions during the column generation procedure. Existing hashing methods optimize over simple objectives such as the reconstruction error or graph Laplacian related loss functions, instead of the performance evaluation criteria of interest—multivariate performance measures such as the AUC and NDCG. Our column generation based method can be further generalized from the triplet loss to a general structured learning based framework that allows one to directly optimize multivariate performance measures. For optimizing general ranking measures, the resulting optimization problem can involve exponentially or infinitely many variables and constraints, which is more challenging than standard structured output learning. We use a combination of column generation and cutting-plane techniques to solve the optimization problem. To speed-up the training we further explore stage-wise training and propose to optimize a simplified NDCG loss for efficient inference. We demonstrate the generality of our method by applying it to ranking prediction and image retrieval, and show that it outperforms several state-of-the-art hashing methods.

[1]  Chiranjib Bhattacharyya,et al.  Structured learning for non-smooth ranking losses , 2008, KDD.

[2]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[3]  Luo Si,et al.  Ranking Preserving Hashing for Fast Similarity Search , 2015, IJCAI.

[4]  Guosheng Lin,et al.  Learning Hash Functions Using Column Generation , 2013, ICML.

[5]  Antonio Torralba,et al.  Multidimensional Spectral Hashing , 2012, ECCV.

[6]  David Suter,et al.  A General Two-Step Approach to Learning-Based Hashing , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[8]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[9]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[11]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[13]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[16]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[17]  Jianxin Wu,et al.  Optimizing Ranking Measures for Compact Binary Code Learning , 2014, ECCV.

[18]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[19]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[20]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[21]  Gert R. G. Lanckriet,et al.  Efficient Learning of Mahalanobis Metrics for Ranking , 2014, ICML.

[22]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  David Suter,et al.  Fast Supervised Hashing with Decision Trees for High-Dimensional Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[26]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[27]  Stan Sclaroff,et al.  Adaptive Hashing for Fast Similarity Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Daphna Weinshall,et al.  Online Learning in the Embedded Manifold of Low-rank Matrices , 2012, J. Mach. Learn. Res..

[29]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[31]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[32]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[33]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[34]  Guosheng Lin,et al.  StructBoost: Boosting Methods for Predicting Structured Output Variables , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[36]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[37]  Chunhua Shen,et al.  On the Dual Formulation of Boosting Algorithms , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Deng Cai,et al.  Extensions to Self-Taught Hashing: Kernelisation and Supervision , 2010 .

[39]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[40]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[41]  Fumin Shen,et al.  Inductive Hashing on Manifolds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.