论文信息 - Communication-Efficient Distributed Block Minimization for Nonlinear Kernel Machines

Communication-Efficient Distributed Block Minimization for Nonlinear Kernel Machines

Nonlinear kernel machines often yield superior predictive performance on various tasks; however, they suffer from severe computational challenges. In this paper, we show how to overcome the important challenge of speeding up kernel machines using multiple computers. In particular, we develop a parallel block minimization framework, and demonstrate its good scalability in solving nonlinear kernel SVM and logistic regression. Our framework proceeds by dividing the problem into smaller subproblems by forming a block-diagonal approximation of the Hessian matrix. The subproblems are then solved approximately in parallel. After that, a communication efficient line search procedure is developed to ensure sufficient reduction of the objective function value by exploiting the problem structure of kernel machines. We prove global linear convergence rate of the proposed method with a wide class of subproblem solvers, and our analysis covers strongly convex and some non-strongly convex functions. We apply our algorithm to solve large-scale kernel SVM problems on distributed systems, and show a significant improvement over existing parallel solvers. As an example, on the covtype dataset with half-a-million samples, our algorithm can obtain an approximate solution with 96% accuracy in 20 seconds using 32 machines, while all the other parallel kernel SVM solvers require more than 2000 seconds to achieve a solution with 95% accuracy. Moreover, our algorithm is the first distributed kernel SVM solver that can scale to massive data sets. On the KDDB dataset (20 million samples and 30 million features), our parallel solver can compute the kernel SVM solution within half an hour using 32 machines with 640 cores in total, while existing solvers can not scale to this dataset.

Inderjit S. Dhillon | Cho-Jui Hsieh | Si Si

[1] Tara N. Sainath,et al. Kernel methods match Deep Neural Networks on TIMIT , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Ambuj Tewari,et al. Feature Clustering for Accelerating Parallel Coordinate Descent , 2012, NIPS.

[3] Benjamin Recht,et al. Large Scale Kernel Learning using Block Coordinate Descent , 2016, ArXiv.

[4] Inderjit S. Dhillon,et al. Computationally Efficient Nyström Approximation using Fast Transforms , 2016, ICML.

[5] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[6] James Demmel,et al. CA-SVM : Communication-Avoiding Support Vector Machines on Clusters , 2016 .

[7] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[8] S. Sundararajan,et al. A Distributed Algorithm for Training Nonlinear Kernel Machines , 2014, ArXiv.

[9] Michael I. Jordan,et al. Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.

[10] Chih-Jen Lin,et al. Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[11] Igor Durdanovic,et al. Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[12] C. C. Chang,et al. Libsvm : introduction and benchmarks , 2000 .

[13] Edward Y. Chang,et al. Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.