Sparkling Vector Machines

Support vector machines (SVMs) are widely-used for classification task in literature. A data augmentation algorithm is proposed to improve the learning of the machinery. Distributed SVMs are well-studied, but the distributed implementation for SVM with data augmentation has not been explored. This paper introduces a distributed version called sparkling vector machine which is implemented in Apache Spark, a recent advanced platform for distributed computing. We demonstrate the scalability of our proposed method on large-scale datasets with hundreds of million data points. The experimental results show that the predictive performances of our method are better than or comparable with those of baselines whilst the execution time is orders of magnitude lower.