论文信息 - Bundle CDN: A Highly Parallelized Approach for Large-Scale ℓ1-Regularized Logistic Regression

Bundle CDN: A Highly Parallelized Approach for Large-Scale ℓ1-Regularized Logistic Regression

Parallel coordinate descent algorithms emerge with the growing demand of large-scale optimization. In general, previous algorithms are usually limited by their divergence under high degree of parallelism (DOP), or need data pre-process to avoid divergence. To better exploit parallelism, we propose a coordinate descent based parallel algorithm without needing of data pre-process, termed as Bundle Coordinate Descent Newton (BCDN), and apply it to large-scale l1-regularized logistic regression. BCDN first randomly partitions the feature set into Q non-overlapping subsets/bundles in a Gauss-Seidel manner, where each bundle contains P features. For each bundle, it finds the descent directions for the P features in parallel, and performs P-dimensional Armijo line search to obtain the stepsize. By theoretical analysis on global convergence, we show that BCDN is guaranteed to converge with a high DOP. Experimental evaluations over five public datasets show that BCDN can better exploit parallelism and outperforms state-of-the-art algorithms in speed, without losing testing accuracy.

[1] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[2] Paul Tseng,et al. A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[3] John Langford,et al. Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[4] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[5] Ambuj Tewari,et al. Feature Clustering for Accelerating Parallel Coordinate Descent , 2012, NIPS.

[6] David Kaeli,et al. Heterogeneous Computing with OpenCL , 2011 .

[7] Chih-Jen Lin,et al. Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[8] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[9] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[11] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[12] S. Osher,et al. Coordinate descent optimization for l 1 minimization with application to compressed sensing; a greedy algorithm , 2009 .

[13] Tai Sing Lee,et al. Stochastic feature mapping for PAC-Bayes classification , 2015, Machine Learning.

[14] Jian Song,et al. Parallelized Annealed Particle Filter for real-time marker-less motion tracking via heterogeneous computing , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[16] Chih-Jen Lin,et al. Newton's Method for Large Bound-Constrained Optimization Problems , 1999, SIAM J. Optim..

[17] James V. Burke,et al. Descent methods for composite nondifferentiable optimization problems , 1985, Math. Program..

[18] Tai Sing Lee,et al. Hybrid generative-discriminative classification using posterior divergence , 2011, CVPR 2011.

[19] Chih-Jen Lin,et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..