Parallelized FPA-SVM: Parallelized parameter selection and classification using Flower Pollination Algorithm and Support Vector Machine

Support Vector Machine (SVM) is one of the most popular machine learning algorithm to perform classification tasks and help organizations in different ways to improve their efficiency. A lot of studies have been made to improve SVM including speed, accuracy, and/or scalability. The algorithm possesses parameters that need precision tuning to perform well. This work proposes a novel parallelized parameter selection using Flower Pollination Algorithm (FPA) to quickly find the optimal parameters of SVM. In particular, MapReduce algorithm introduced in big data framework is applied to both FPA and SVM, which forms a fully distributed algorithm to support a large dataset. The experimental results of Parallelized FPA-SVM on real datasets show its outstanding speed in generating optimal models while maintaining high accuracy.

[1]  Rajkumar Buyya,et al.  MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms , 2008, 2008 IEEE Fourth International Conference on eScience.

[2]  Zhuang Wang,et al.  Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Jared Kramer,et al.  Improvement of a Naive Bayes Sentiment Classifier Using MRS-Based Features , 2014, *SEMEVAL.

[4]  Kevin D. Seppi,et al.  Parallel PSO using MapReduce , 2007, 2007 IEEE Congress on Evolutionary Computation.

[5]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[6]  Xin-She Yang,et al.  Flower Pollination Algorithm for Global Optimization , 2012, UCNC.

[7]  Boris V. Dobrov,et al.  Support Vector Machine Parameter Optimization for Text Categorization Problems , 2003, ISTA.

[8]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[9]  Maozhen Li,et al.  A MapReduce based parallel SVM for large scale spam filtering , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[10]  Michael R. Lyu,et al.  A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training , 2007, Appl. Math. Comput..

[11]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[12]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[13]  Vimalkumar B. Vaghela,et al.  Analysis of Various Sentiment Classification Techniques , 2016 .

[14]  Chih-Hung Wu,et al.  A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy , 2007, Expert Syst. Appl..

[15]  Ferhat Özgür Çatak,et al.  A MapReduce based distributed SVM algorithm for binary classification , 2013, ArXiv.

[16]  Hak-Keung Lam,et al.  Tuning of the structure and parameters of a neural network using an improved genetic algorithm , 2003, IEEE Trans. Neural Networks.

[17]  Michael R. Lyu,et al.  A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training , 2007, Appl. Math. Comput..

[18]  Koby Crammer,et al.  Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification , 2011, KDD.

[19]  Osama Abdel Raouf,et al.  A Novel Hybrid Flower Pollination Algorithm with Chaotic Harmony Search for Solving Sudoku Puzzles , 2014 .

[20]  Rachsuda Jiamthapthaksin,et al.  Integrating Labeled Latent Dirichlet Allocation into sentiment analysis of movie and general domains , 2017, 2017 9th International Conference on Knowledge and Smart Technology (KST).

[21]  Gideon S. Mann,et al.  Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.