Distributed training of Large-scale Logistic models

Regularized Multinomial Logistic regression has emerged as one of the most common methods for performing data classification and analysis. With the advent of large-scale data it is common to find scenarios where the number of possible multinomial outcomes is large (in the order of thousands to tens of thousands) and the dimensionality is high. In such cases, the computational cost of training logistic models or even simply iterating through all the model parameters is prohibitively expensive. In this paper, we propose a training method for large-scale multinomial logistic models that breaks this bottleneck by enabling parallel optimization of the likelihood objective. Our experiments on large-scale datasets showed an order of magnitude reduction in training time.

[1]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[2]  R. Rust,et al.  Customer satisfaction, customer retention, and market share , 1993 .

[3]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[4]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[5]  B. Kirkwood Essentials of medical statistics. , 1988 .

[6]  Hal Daumé Notes on CG and LM-BFGS Optimization of Logistic Regression , 2008 .

[7]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[8]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[9]  Stephen P. Boyd,et al.  Tractable approximate robust geometric programming , 2007, Optimization and Engineering.

[10]  Mohammad Emtiyaz Khan,et al.  Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models , 2011, ICML.

[11]  David F. Shanno,et al.  An example of numerical nonconvergence of a variable-metric method , 1985 .

[12]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[13]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[14]  Guillaume Bouchard Efficient Bounds for the Softmax Function and Applications to Approximate Inference in Hybrid models , 2008 .

[15]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[16]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .

[17]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[18]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[19]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[20]  Chih-Jen Lin,et al.  Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.