L3-SVMs: Landmarks-based Linear Local Support Vectors Machines

For their ability to capture non-linearities in the data and to scale to large training sets, local Support Vector Machines (SVMs) have received a special attention during the past decade. In this paper, we introduce a new local SVM method, called L$^3$-SVMs, which clusters the input space, carries out dimensionality reduction by projecting the data on landmarks, and jointly learns a linear combination of local models. Simple and effective, our algorithm is also theoretically well-founded. Using the framework of Uniform Stability, we show that our SVM formulation comes with generalization guarantees on the true risk. The experiments based on the simplest configuration of our model (i.e. landmarks randomly selected, linear projection, linear kernel) show that L$^3$-SVMs is very competitive w.r.t. the state of the art and opens the door to new exciting lines of research.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[3]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[4]  Jun Zhou,et al.  Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[5]  Stephen Tyree,et al.  Stochastic Neighbor Compression , 2014, ICML.

[6]  Philip H. S. Torr,et al.  Locally Linear Support Vector Machines , 2011, ICML.

[7]  Jason Weston,et al.  Breaking SVM Complexity with Cross-Training , 2004, NIPS.

[8]  Jiawei Han,et al.  Clustered Support Vector Machines , 2013, AISTATS.

[9]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Prateek Jain,et al.  Similarity-based Learning via Data Driven Embeddings , 2011, NIPS.

[12]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[13]  Barbara Caputo,et al.  Multiclass Latent Locally Linear Support Vector Machines , 2013, ACML.

[14]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[15]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[18]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.