A parallel SVM training algorithm on large-scale classification problems

Support vector machine (SVM) has become a popular classification tool but the main disadvantages of SVM algorithms are their large memory requirement and computation time to deal with very large datasets. To speed up the process of training SVM, parallel methods have been proposed by splitting the problem into smaller subsets and training a network to assign samples of different subsets. A parallel training algorithm on large-scale classification problems is proposed, in which multiple SVM classifiers are applied and may be trained in a distributed computer system. As an improvement algorithm of cascade SVM, the support vectors are obtained according to the data samples' distance mean and the feedback is not the whole final output but alternating to avoid the problem that the learning results are subject to the distribution state of the data samples in different subsets. The experiment results on real-world text dataset show that this parallel SVM training algorithm is efficient and has more satisfying accuracy compared with standard cascade SVM algorithm in classification precision.