Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification

This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems. Our key idea is to represent each sample with compact hash bits, over which an inner product is defined to serve as the surrogate of the original nonlinear kernels. Then the problem of solving the nonlinear SVM can be transformed into solving a linear SVM over the hash bits. The proposed Hash-SVM enjoys dramatic storage cost reduction owing to the compact binary representation, as well as a (sub-)linear training complexity via linear SVM. As a critical component of Hash-SVM, we propose a novel hashing scheme for arbitrary non-linear kernels via random subspace projection in reproducing kernel Hilbert space. Our comprehensive analysis reveals a well behaved theoretic bound of the deviation between the proposed hashing-based kernel approximation and the original kernel function. We also derive requirements on the hash bits for achieving a satisfactory accuracy level. Several experiments on large-scale visual classification benchmarks are conducted, including one with over 1 million images. The results show that Hash-SVM greatly reduces the computational complexity (more than ten times faster in many cases) while keeping comparable accuracies.

[1]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[2]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[4]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[6]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[7]  Michael W. Mahoney,et al.  Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) , 2005 .

[8]  Avner Magen,et al.  Low rank matrix-valued chernoff bounds and approximate matrix multiplication , 2010, SODA '11.

[9]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[12]  Dan Roth,et al.  Selective block minimization for faster convergence of limited memory large-scale linear models , 2011, KDD.

[13]  Shuicheng Yan,et al.  Weakly-supervised hashing in kernel space , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[15]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[16]  Luca Zanni,et al.  Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[17]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix , 2006, SIAM J. Comput..

[18]  Wei Liu,et al.  Scalable similarity search with optimized kernel hashing , 2010, KDD.

[19]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[20]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[23]  Koby Crammer,et al.  Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification , 2011, KDD.

[24]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[26]  Fuxin Li,et al.  Chebyshev Approximations to the Histogram $\chi^2$ Kernel , 2012 .

[27]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[28]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[29]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[30]  Ping Li,et al.  Hashing Algorithms for Large-Scale Learning , 2011, NIPS.

[31]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[32]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[33]  Nathan Srebro,et al.  Beating SGD: Learning SVMs in Sublinear Time , 2011, NIPS.

[34]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[35]  Shih-Fu Chang,et al.  Accelerated Large Scale Optimization by Concomitant Hashing , 2012, ECCV.

[36]  Cristian Sminchisescu,et al.  Chebyshev approximations to the histogram χ2 kernel , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[38]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[39]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.