SVM and k-Means Hybrid Method for Textual Data Sentiment Analysis

The goal of this paper is to propose a hybrid technique to improve Support Vector Machines classification accuracy using training data sampling and hyperparameter tuning. The proposed technique applies clustering to select training data and parameter tuning to optimize classifier effectiveness. The paper reports that better results were obtained using our proposed method in all experiments, compared to results of method presented in our previous work.

[1]  Fu Jiang,et al.  A Fast Content-Based Spam Filtering Algorithm with Fuzzy-SVM and K-means , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[2]  Philip Treleaven,et al.  Twitter Sentiment Analysis , 2015, ArXiv.

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[5]  Haidar Osman,et al.  Hyperparameter optimization to improve bug prediction accuracy , 2017, 2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE).

[6]  Santanu Kumar Rath,et al.  Classification of sentiment reviews using n-gram machine learning approach , 2016, Expert Syst. Appl..

[7]  Mu-Yen Chen,et al.  Bankruptcy prediction in firms with statistical and intelligent techniques and a comparison of evolutionary computation approaches , 2011, Comput. Math. Appl..

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[10]  Pramod P. Khargonekar,et al.  Fast SVM training using approximate extreme points , 2013, J. Mach. Learn. Res..

[11]  Ahmed H. Tewfik,et al.  Data subset selection for efficient SVM training , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[12]  Yun Yang,et al.  K-means based on active learning for support vector machine , 2017, 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS).

[13]  Jiawei Han,et al.  Clustered Support Vector Machines , 2013, AISTATS.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  José R. Dorronsoro,et al.  Finding optimal model parameters by deterministic and annealed focused grid search , 2009, Neurocomputing.

[16]  Venu Govindaraju,et al.  Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection , 2004 .

[17]  Chih-Hung Wu,et al.  A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy , 2007, Expert Syst. Appl..

[18]  Mahmoud Al-Ayyoub,et al.  Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews , 2017, J. Comput. Sci..

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Vishal. A. Kharde,et al.  Sentiment Analysis of Twitter Data : A Survey of Techniques , 2016, ArXiv.

[21]  Adel Al-Jumaily,et al.  A novel partially connected cooperative parallel PSO-SVM algorithm: Study based on sleep apnea detection , 2012, 2012 IEEE Congress on Evolutionary Computation.

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[23]  Amit Agarwal,et al.  Comparative Study of Machine Learning Approaches for Amazon Reviews , 2018 .

[24]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[25]  Christian Igel,et al.  Evolutionary tuning of multiple SVM parameters , 2005, ESANN.

[26]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[27]  Zhouyu Fu,et al.  Fast kernel SVM training via support vector identification , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[28]  Olga Kurasova,et al.  Strategies for Big Data Clustering , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[29]  R Manikandan,et al.  Machine learning algorithms for text-documents classification: A review , 2018 .

[30]  Munir Ahmad,et al.  SVM Optimization for Sentiment Analysis , 2018 .

[31]  Gintautas Garsva,et al.  Particle swarm optimization for linear support vector machines based classifier selection , 2014 .

[32]  Thomas P. Trappenberg,et al.  A Heuristic for Free Parameter Optimization with Support Vector Machines , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[33]  Yang Liu,et al.  K-SVM: An Effective SVM Algorithm Based on K-means Clustering , 2013, J. Comput..

[34]  Sisi Liu,et al.  Email Sentiment Analysis Through k-Means Labeling and Support Vector Machine Classification , 2018, Cybern. Syst..

[35]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[36]  Huy Nguyen,et al.  Twitter Sentiment Analysis Using Machine Learning Techniques , 2020, ICCSAMA.

[37]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[38]  Ming-Huwi Horng,et al.  The Construction of Support Vector Machine Classifier Using the Firefly Algorithm , 2015, Comput. Intell. Neurosci..

[39]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[40]  Faisal Muhammad Shah,et al.  Sentiment analysis on large scale Amazon product reviews , 2018, 2018 IEEE International Conference on Innovative Research and Development (ICIRD).

[41]  Paulius Danenas,et al.  SVM Accuracy and Training Speed Trade-Off in Sentiment Analysis Tasks , 2018, ICIST.

[42]  K. alik An efficient k'-means clustering algorithm , 2008 .

[43]  Liu Li-xia,et al.  Tax forecasting theory and model based on SVM optimized by PSO , 2011, Expert Syst. Appl..

[44]  Xiaoming Zhang,et al.  Training data reduction to speed up SVM training , 2014, Applied Intelligence.

[45]  Soujanya,et al.  Feature Selection and Hyperparameter Optimization of SVM for Human Activity Recognition , 2016, 2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI).

[46]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[47]  Marwan Bikdash,et al.  Classifying Political Tweets Using Naïve Bayes and Support Vector Machines , 2018, IEA/AIE.

[48]  Konstantinas Korovkinas,et al.  SVM and Naïve Bayes Classification Ensemble Method for Sentiment Analysis , 2017, Balt. J. Mod. Comput..

[49]  Jie Cai,et al.  Simultaneous Feature Selection and LS-SVM Parameters Optimization Algorithm Based on PSO , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.