Algorithms for Sparse Linear Classifiers in the Massive Data Setting

Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-pass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive data sets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.

[1]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[4]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[5]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[6]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[7]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[8]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[9]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[10]  Anil K. Jain,et al.  Bayesian learning of sparse classifiers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[12]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[13]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  Yuan Qi,et al.  Predictive automatic relevance determination by expectation propagation , 2004, ICML.

[16]  Tong Zhang,et al.  On the Dual Formulation of Regularized Linear Systems with Convex Risks , 2002, Machine Learning.

[17]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[18]  Andrew W. Moore,et al.  Logistic regression for data mining and high-dimensional classification , 2004 .

[19]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[20]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[21]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[23]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[24]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..