Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets

Penalized logistic regression (PLR) is a widely used supervised learning model. In this paper, we consider its applications in large-scale data problems and resort to a stochastic primal-dual approach for solving PLR. In particular, we employ a random sampling technique in the primal step and a multiplicative weights method in the dual step. This technique leads to an optimization method with sublinear dependency on both the volume and dimensionality of training data. We develop concrete algorithms for PLR with l2-norm and l1-norm penalties, respectively. Experimental results over several large-scale and high-dimensional datasets demonstrate both efficiency and accuracy of our algorithms.

[1]  Nathan Srebro,et al.  Beating SGD: Learning SVMs in Sublinear Time , 2011, NIPS.

[2]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[3]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[4]  Nathan Srebro,et al.  The Kernelized Stochastic Batch Perceptron , 2012, ICML.

[5]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[8]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[9]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[10]  Elad Hazan,et al.  Approximating Semidefinite Programs in Sublinear Time , 2011, NIPS.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Shusaku Tsumoto,et al.  Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model , 2004, Inf. Sci..

[13]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[14]  Carole B. Hogan,et al.  The Livermore distributed storage system: requirements and overview , 1990, [1990] Digest of papers. Tenth IEEE Symposium on Mass Storage Systems@m_Crisis in Mass Storage.

[15]  Dhabaleswar K. Panda,et al.  Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms , 1995, Proceedings of 9th International Parallel Processing Symposium.

[16]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[17]  Elad Hazan,et al.  Optimal Algorithms for Ridge and Lasso Regression with Partially Observed Attributes , 2011, ArXiv.

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  David P. Woodruff,et al.  Sublinear Optimization for Machine Learning , 2010, FOCS.

[20]  David Madigan,et al.  Algorithms for Sparse Linear Classifiers in the Massive Data Setting , 2008 .