Word clustering based on POS feature for efficient twitter sentiment analysis

With rapid growth of social networking service on Internet, huge amount of information are continuously generated in real time. As a result, sentiment analysis of online reviews and messages has become a popular research issue [1]. In this paper a novel modified Chi Square-based feature clustering and weighting scheme is proposed for the sentiment analysis of twitter message. Along with the part of speech tagging, the discriminability and dependency of the words in the tagged training dataset are taken into account in the clustering and weighting process. The multinomial Naïve Bayes model is also employed to handle redundant features, and the influence of emotional words is raised for maximizing the accuracy. Computer simulation with Sentiment 140 workload shows that the proposed scheme significantly outperforms four existing representative sentiment analysis schemes in terms of the accuracy regardless of the size of training and test data.

[1]  Wen Wang,et al.  Deep feature weighting in Naive Bayes for Chinese text classification , 2016, 2016 4th International Conference on Cloud Computing and Intelligence Systems (CCIS).

[2]  GuoQiang An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification , 2010 .

[3]  Jun Zhang,et al.  Enhanced Twitter Sentiment Analysis by Using Feature Selection and Combination , 2015, 2015 International Symposium on Security and Privacy in Social Networks and Big Data (SocialSec).

[4]  Hee Yong Youn,et al.  A Novel Feature-Based Text Classification Improving the Accuracy of Twitter Sentiment Analysis , 2017, CSA/CUTE.

[5]  Maria Virvou,et al.  Sentiment analysis of Facebook statuses using Naive Bayes classifier for language learning , 2013, IISA 2013.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Abdellah Madani,et al.  An improved Chi-sqaure feature selection for Arabic text classification using decision tree , 2016, 2016 11th International Conference on Intelligent Systems: Theories and Applications (SITA).

[8]  Yuan Tian,et al.  Chi-square Statistics Feature Selection Based on Term Frequency and Distribution for Text Categorization , 2015 .

[9]  Tinghuai Ma,et al.  A novel subgraph K+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{+}$$\end{document}-isomorphism method in social , 2017, Soft Computing.

[10]  Yuhui Zheng,et al.  Student’s t-Hidden Markov Model for Unsupervised Learning Using Localized Feature Selection , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Yi Pan,et al.  A Comprehensive Review of Emerging Computational Methods for Gene Identification , 2016, J. Inf. Process. Syst..

[12]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[13]  Xiao Wang,et al.  World Cup 2014 in the Twitter World: A big data analysis of sentiments in U.S. sports fans' tweets , 2015, Comput. Hum. Behav..

[14]  Qian Wang,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[15]  Shiguo Lian,et al.  Forensics feature analysis in quaternion wavelet domain for distinguishing photographic images and computer graphics , 2017, Multimedia Tools and Applications.

[16]  Song Wei,et al.  A novel feature-based method for sentiment analysis of Chinese product reviews , 2014, China Communications.

[17]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[18]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[20]  Zakaria Elberrichi,et al.  Feature selection for text classification using genetic algorithms , 2016, 2016 8th International Conference on Modelling, Identification and Control (ICMIC).

[21]  Maria Virvou,et al.  Evaluation of ensemble-based sentiment classifiers for Twitter data , 2016, 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA).

[22]  Peng Jin,et al.  Fast reference frame selection based on content similarity for low complexity HEVC encoder , 2016, J. Vis. Commun. Image Represent..

[23]  Jaspreet Singh,et al.  Optimization of sentiment analysis using machine learning classifiers , 2017, Human-centric Computing and Information Sciences.

[24]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[25]  Xingming Sun,et al.  Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement , 2016, IEEE Transactions on Parallel and Distributed Systems.

[26]  Yaregal Assabie,et al.  A Hybrid Approach to the Development of Part-of-Speech Tagger for Kafi-noonoo Text , 2014, CICLing.

[27]  Hima Suresh,et al.  An unsupervised fuzzy clustering method for twitter sentiment analysis , 2016, 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS).

[28]  Zhihua Xia,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[29]  Huang Zou,et al.  Sentiment Classification Using Machine Learning Techniques with Syntax Features , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[30]  Gregory M. Provan,et al.  A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers , 1995, ICML.

[31]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[32]  Xingming Sun,et al.  Structural Minimax Probability Machine , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Maria Virvou,et al.  Comparative Evaluation of Algorithms for Sentiment Analysis over Social Networking Services , 2017, J. Univers. Comput. Sci..

[34]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[35]  Yao Wang,et al.  LED: A fast overlapping communities detection algorithm based on structural clustering , 2016, Neurocomputing.

[36]  Xiaola Lin,et al.  DLRankSVM: an efficient distributed algorithm for linear RankSVM , 2017, The Journal of Supercomputing.

[37]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[38]  Bin Li,et al.  Bio-inspired ant colony optimization based clustering algorithm with mobile sinks for applications in consumer home automation networks , 2015, IEEE Transactions on Consumer Electronics.

[39]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[40]  Timothy O'Keefe Feature Selection and Weighting Methods in Sentiment Analysis , 2009 .

[41]  Young-Koo Lee,et al.  A Load-Balancing and Energy-Aware Clustering Algorithm in Wireless Ad-Hoc Networks , 2005, EUC Workshops.

[42]  Xingming Sun,et al.  Fast Motion Estimation Based on Content Property for Low-Complexity H.265/HEVC Encoder , 2016, IEEE Transactions on Broadcasting.

[43]  Swagatam Das,et al.  Simultaneous feature selection and weighting - An evolutionary multi-objective optimization approach , 2015, Pattern Recognit. Lett..

[44]  Lin Chen,et al.  Term-frequency Based Feature Selection Methods for Text Categorization , 2010, 2010 Fourth International Conference on Genetic and Evolutionary Computing.

[45]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[46]  Yeresime Suresh,et al.  Software quality assessment for open source software using logistic & Naive Bayes classifier , 2016, 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS).

[47]  Antonio J. Plaza,et al.  Parallel and Distributed Dimensionality Reduction of Hyperspectral Data on Cloud Computing Architectures , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[48]  Sai Ji,et al.  Energy-efficient cluster-based dynamic routes adjustment approach for wireless sensor networks with mobile sinks , 2017, The Journal of Supercomputing.

[49]  Guo Qiang An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification , 2010, 2010 Second International Conference on Computer Research and Development.

[50]  Neeraj Sharma,et al.  Text classification using combined sparse representation classifiers and support vector machines , 2016, 2016 4th International Symposium on Computational and Business Intelligence (ISCBI).

[51]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[52]  Maria Virvou,et al.  The effect of preprocessing techniques on Twitter sentiment analysis , 2016, 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA).

[53]  Jin Wang,et al.  Privacy-Preserving Smart Similarity Search Based on Simhash over Encrypted Data in Cloud Computing , 2015 .

[54]  Rubén San-Segundo-Hernández,et al.  Feature extraction for robust physical activity recognition , 2017, Human-centric Computing and Information Sciences.