论文信息 - Sparkling Vector Machines

Sparkling Vector Machines

Support vector machines (SVMs) are widely-used for classification task in literature. A data augmentation algorithm is proposed to improve the learning of the machinery. Distributed SVMs are well-studied, but the distributed implementation for SVM with data augmentation has not been explored. This paper introduces a distributed version called sparkling vector machine which is implemented in Apache Spark, a recent advanced platform for distributed computing. We demonstrate the scalability of our proposed method on large-scale datasets with hundreds of million data points. The experimental results show that the predictive performances of our method are better than or comparable with those of baselines whilst the execution time is orders of magnitude lower.

[1] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[2] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[3] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[4] P. Baldi,et al. Deep Learning in High-Energy Physics: Improving the Search for Exotic Particles , 2014 .

[5] Chia-Hua Ho,et al. Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[6] L. Bottou,et al. Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[7] Nicholas G. Polson,et al. Data augmentation for support vector machines , 2011 .

[8] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..