Random Projections for Linear Support Vector Machines

Let X be a data matrix of rank ρ, whose rows represent n points in d-dimensional space. The linear support vector machine constructs a hyperplane separator that maximizes the 1-norm soft margin. We develop a new oblivious dimension reduction technique that is precomputed and can be applied to any input matrix X. We prove that, with high probability, the margin and minimum enclosing ball in the feature space are preserved to within ε-relative error, ensuring comparable generalization as in the original space in the case of classification. For regression, we show that the margin is preserved to ε-relative error with high probability. We present extensive experiments with real and synthetic data to support our theory.

[1]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[2]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Ramesh Hariharan,et al.  A Randomized Algorithm for Large Scale Support Vector Learning , 2007, NIPS.

[5]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[6]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[8]  Anton van den Hengel,et al.  Is margin preserved after random projection? , 2012, ICML.

[9]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[10]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[11]  Ramesh Hariharan,et al.  Randomized Algorithms for Large scale SVMs , 2009, ArXiv.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[14]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[15]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[16]  Christos Boutsidis,et al.  Random Projections for Support Vector Machines , 2012, AISTATS.

[17]  Avner Magen,et al.  Low rank matrix-valued chernoff bounds and approximate matrix multiplication , 2010, SODA '11.

[18]  Evgeniy Gabrilovich,et al.  Parameterized generation of labeled datasets for text categorization based on a hierarchical directory , 2004, SIGIR '04.

[19]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[20]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[21]  Osamu Watanabe,et al.  A Random Sampling Technique for Training Support Vector Machines , 2001, ALT.

[22]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[23]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[26]  Osamu Watanabe,et al.  Provably Fast Support Vector Regression Using Random Sampling ∗ , 2002 .

[27]  Petros Drineas,et al.  Ancestry informative markers for fine-scale individual assignment to worldwide populations , 2010, Journal of Medical Genetics.

[28]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[29]  Avrim Blum,et al.  Random Projection, Margins, Kernels, and Feature-Selection , 2005, SLSFS.

[30]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[31]  Jiawei Han,et al.  Orthogonal Laplacianfaces for Face Recognition , 2006, IEEE Transactions on Image Processing.

[32]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[33]  Rong Jin,et al.  Recovering the Optimal Solution by Dual Random Projection , 2012, COLT.

[34]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..