Sublinear Optimization for Machine Learning

We give sub linear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, and $L_2$-SVM, for which sub linear-time algorithms were not known before. These new algorithms use a combination of a novel sampling techniques and a new multiplicative update algorithm. We give lower bounds which show the running times of many of our algorithms to be nearly best possible in the unit-cost RAM model. We also give implementations of our algorithms in the semi-streaming setting, obtaining the first low pass polylogarithmic space and sub linear time algorithms achieving arbitrary approximation factor.

[1]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[2]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[3]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2004, STOC '04.

[4]  Aravind Srinivasan,et al.  Randomized Distributed Edge Coloring via an Extension of the Chernoff-Hoeffding Bounds , 1997, SIAM J. Comput..

[5]  Timothy M. Chan,et al.  A Simple Streaming Algorithm for Minimum Enclosing Balls , 2006, CCCG.

[6]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[7]  Joan Feigenbaum,et al.  Graph Distances in the Data-Stream Model , 2008, SIAM J. Comput..

[8]  D. Freedman,et al.  Finite Exchangeable Sequences , 1980 .

[9]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[10]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[11]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[12]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[13]  Rocco A. Servedio,et al.  On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm , 1999, COLT '99.

[14]  Éva Tardos,et al.  Fast approximation algorithms for fractional packing and covering problems , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[15]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[16]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[17]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[18]  Xinhua Zhang,et al.  New approximation algorithms for minimum enclosing convex shapes , 2009, SODA '11.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  S. V. N. Vishwanathan,et al.  Efficient Approximation Algorithms for Minimum Enclosing Convex Shapes , 2009, ArXiv.

[21]  Leonid Khachiyan,et al.  A sublinear-time randomized approximation algorithm for matrix games , 1995, Oper. Res. Lett..

[22]  David P. Woodruff,et al.  1-pass relative-error Lp-sampling with applications , 2010, SODA '10.

[23]  Tom Bylander,et al.  Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[24]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[25]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[26]  Pankaj K. Agarwal,et al.  Streaming Algorithms for Extent Problems in High Dimensions , 2010, SODA '10.

[27]  Bernhard Schölkopf,et al.  A Short Introduction to Learning with Kernels , 2002, Machine Learning Summer School.

[28]  Ohad Shamir,et al.  Online Learning of Noisy Data with Kernels , 2010, COLT 2010.

[29]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[30]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[32]  Elad Hazan The convex optimization approach to regret minimization , 2011 .