An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter

Abstract Sentiment Analysis is currently considered as one of the most attractive research topics in Natural Language Processing (NLP) field. The main objective of sentiment analysis is to identify the opinions and emotions of the users through written contents. While there are different studies that have approached this field using various techniques, it is still considered a challenging topic with many difficulties that are yet to be solved, such as having modern accents, slang words, spelling and grammatical mistakes, and other issues that cannot be overcome with traditional methods and sentiment lexicons. In this work, we propose a hybrid machine learning approach to enhance sentiment analysis; as we build a classification model based on three classes, which are positive, neutral, and negative emotions, using Support Vector Machines (SVM) classifier, while combining two feature selection techniques using the ReliefF and Multi-Verse Optimizer (MVO) algorithms. We also extract more than 6900 tweets from Twitter social network to test our work. Our hybrid method is compared against other classifiers and methods in terms of accuracy. Results show that our proposed method outperforms other techniques and classifiers, by obtaining better results in most of the datasets while reducing the number of features by up to 96.85% from the original feature set. We also categorize the extracted features into Objective, Subjective and Emoticon words to analyze them during the first and the final feature selection processes and find any existing relations. Very similar results are obtained by both feature selection techniques; due to a number of factors that are explained in this paper.

[1]  Aytug Onan,et al.  A feature selection model based on genetic rank aggregation for text sentiment classification , 2017, J. Inf. Sci..

[2]  Rayner Alfred,et al.  A Performance Comparison of Feature Extraction Methods for Sentiment Analysis , 2017, ACIIDS.

[3]  Chunyan Miao,et al.  Analyzing Sentiments in One Go: A Supervised Joint Topic Modeling Approach , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Qi Han,et al.  CodeX: Combining an SVM Classifier and Character N-gram Language Models for Sentiment Analysis on Twitter Text , 2013, *SEMEVAL.

[5]  Roliana Ibrahim,et al.  Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis , 2016, ACIIDS.

[6]  A. Hudaib,et al.  Hybrid Data Mining Models for Predicting Customer Churn , 2015 .

[7]  Gan Wenyan,et al.  Machine Learning and Lexicon Based Methods for Sentiment Classification: A Survey , 2014 .

[8]  Nazlia Omar,et al.  Study on feature selection and machine learning algorithms for Malay sentiment classification , 2014, Proceedings of the 6th International Conference on Information Technology and Multimedia.

[9]  Jan Zizka,et al.  The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints , 2017, CSOC.

[10]  Hossam Faris,et al.  An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems , 2018, Knowl. Based Syst..

[11]  Thanh Hung Vo,et al.  An Efficient Hybrid Model for Vietnamese Sentiment Analysis , 2017, ACIIDS.

[12]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[13]  Yang Liu,et al.  Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms , 2017, Expert Syst. Appl..

[14]  Hossam Faris,et al.  Identifying β-thalassemia carriers using a data mining approach: The case of the Gaza Strip, Palestine , 2018, Artif. Intell. Medicine.

[15]  Ye Tian,et al.  Facebook sentiment: Reactions and Emojis , 2017, SocialNLP@EACL.

[16]  S. H. Manjula,et al.  Sentiment Analysis and Opinion Mining from Social Media : A Review , 2017 .

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Vo Thi Ngoc Chau,et al.  A decision tree using ID3 algorithm for English semantic analysis , 2017, Int. J. Speech Technol..

[19]  Jianwei Niu,et al.  SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis , 2020, IEEE Transactions on Knowledge and Data Engineering.

[20]  Seyed Mohammad Mirjalili,et al.  Multi-Verse Optimizer: a nature-inspired algorithm for global optimization , 2015, Neural Computing and Applications.

[21]  Deyu Li,et al.  A feature selection method based on improved fisher's discriminant ratio for text sentiment classification , 2011, Expert Syst. Appl..

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Aliza Sarlan,et al.  Twitter sentiment analysis , 2014, Proceedings of the 6th International Conference on Information Technology and Multimedia.

[24]  Justin Zhijun Zhan,et al.  Sentiment analysis using product review data , 2015, Journal of Big Data.

[25]  Simon Fong,et al.  Medical data mining in sentiment analysis based on optimized swarm search feature selection , 2018, Australasian Physical & Engineering Sciences in Medicine.

[26]  Hossam Faris,et al.  Optimizing the Learning Process of Feedforward Neural Networks Using Lightning Search Algorithm , 2016, Int. J. Artif. Intell. Tools.

[27]  Chien Chin Chen,et al.  Quality evaluation of product reviews using an information quality framework , 2011, Decis. Support Syst..

[28]  Hamido Fujita,et al.  Word Sense Disambiguation: A comprehensive knowledge exploitation framework , 2020, Knowl. Based Syst..

[29]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[30]  Jitendra Kumar,et al.  Sentiment Classification: An Approach for Indian Language Tweets Using Decision Tree , 2015, MIKE.

[31]  Rohini S. Rahate,et al.  Feature Selection for Sentiment Analysis by using SVM , 2013 .

[32]  Gui Xiaolin,et al.  Deep Convolution Neural Networks for Twitter Sentiment Analysis , 2018, IEEE Access.

[33]  Taghi M. Khoshgoftaar,et al.  Big Data: Deep Learning for financial sentiment analysis , 2018, Journal of Big Data.

[34]  Akshi Kumar,et al.  Sentiment Analysis Using Cuckoo Search for Optimized Feature Selection on Kaggle Tweets , 2019, Int. J. Inf. Retr. Res..

[35]  Pijush Samui,et al.  A New SVM Method for Recognizing Polarity of Sentiments in Twitter , 2017 .

[36]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[37]  Pushpak Bhattacharyya,et al.  Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis , 2017, Knowl. Based Syst..

[38]  Enrique Herrera-Viedma,et al.  Sentiment analysis: A review and comparative analysis of web services , 2015, Inf. Sci..

[39]  Hossam Faris,et al.  An evolutionary gravitational search-based feature selection , 2019, Inf. Sci..

[40]  Shervin Minaee,et al.  Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models , 2019, ArXiv.

[41]  Qigang Gao,et al.  An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[42]  Ajit Danti,et al.  Effective Sentimental Analysis and Opinion Mining of Web Reviews Using Rule Based Classifiers , 2016 .

[43]  Hossam Faris,et al.  Sentiment analysis for Arabic language: A brief survey of approaches and techniques , 2018, International Journal of Advanced Science and Technology.

[44]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[45]  Salwani Abdullah,et al.  Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis , 2018, J. Inf. Sci..

[46]  Francisco Herrera,et al.  Consensus vote models for detecting and filtering neutrality in sentiment analysis , 2018, Inf. Fusion.

[47]  Francisco Herrera,et al.  Distinguishing between facts and opinions for sentiment analysis: Survey and challenges , 2018, Inf. Fusion.

[48]  Ali Selamat,et al.  Twitter Feature Selection and Classification Using Support Vector Machine for Aspect-Based Sentiment Analysis , 2016, IEA/AIE.

[49]  Ibrahim Aljarah,et al.  A twitter sentiment analysis for cloud providers: A case study of Azure vs. AWS , 2016, 2016 7th International Conference on Computer Science and Information Technology (CSIT).

[50]  Gautam Sanyal,et al.  Preprocessing and Feature Selection Approach for Efficient Sentiment Analysis on Product Reviews , 2016, FICTA.

[51]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[52]  Yung-Ming Li,et al.  Deriving market intelligence from microblogs , 2013, Decis. Support Syst..

[53]  Wilson Vicente Ruggiero,et al.  A Knowledge-Based Recommendation System That Includes Sentiment Analysis and Deep Learning , 2019, IEEE Transactions on Industrial Informatics.

[54]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[55]  Yang Liu,et al.  A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm , 2017, Inf. Sci..

[56]  Ibrahim Aljarah,et al.  Twitter sentiment analysis: A case study in the automotive industry , 2015, 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[57]  Shubhamoy Dey,et al.  A document-level sentiment analysis approach using artificial neural network and sentiment lexicons , 2012, SIAP.

[58]  Ibrahim Aljarah,et al.  Improved whale optimization algorithm for feature selection in Arabic sentiment analysis , 2018, Applied Intelligence.

[59]  Hossam Faris,et al.  Spam profiles detection on social networks using computational intelligence methods: The effect of the lingual context , 2019, J. Inf. Sci..

[60]  Manoela Kohler,et al.  Polarity classification on web-based reviews using Support Vector Machine , 2016, 2016 IEEE Latin American Conference on Computational Intelligence (LA-CCI).

[61]  Horacio Saggion,et al.  Are Emojis Predictable? , 2017, EACL.

[62]  Debi Prosad Dogra,et al.  Prediction of advertisement preference by fusing EEG response and sentiment analysis , 2017, Neural Networks.

[63]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[64]  Saichon Jaiyen,et al.  Opinion mining for Thai restaurant reviews using neural networks and mRMR feature selection , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[65]  Fangzhao Wu,et al.  Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources , 2017, Inf. Fusion.

[66]  Angelo Gaeta,et al.  Hypotheses Analysis and Assessment in Counterterrorism Activities: A Method Based on OWA and Fuzzy Probabilistic Rough Sets , 2020, IEEE Transactions on Fuzzy Systems.

[67]  Hossam Faris,et al.  Salp Chain-Based Optimization of Support Vector Machines and Feature Weighting for Medical Diagnostic Information Systems , 2019, Algorithms for Intelligent Systems.

[68]  Cagatay CATAL,et al.  A sentiment classification model based on multiple classifiers , 2017, Appl. Soft Comput..

[69]  Hossam Faris,et al.  An intelligent system for spam detection and identification of the most relevant features based on evolutionary Random Weight Networks , 2019, Inf. Fusion.

[70]  Hossam Faris,et al.  A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture , 2017, Neural Computing and Applications.

[71]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[72]  Josef Steinberger,et al.  Supervised sentiment analysis in Czech social media , 2014, Inf. Process. Manag..