SWIMS: Semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis

Abstract Sentiment Analysis, also called Opinion Mining, is currently one of the most studied research fields. Its aim is to analyze publics’ sentiments, opinions, attitudes etc., towards different elements such as topics, products, individuals, organizations, or services. Sentiment classification can be achieved by machine learning or lexical based methodologies or a combination of both. In an effort to improve the performance of domain independent lexicons, this research incorporates machine learning with a lexical based approach introducing a new framework called SWIMS to determine the feature weight based on a well-known general-purpose sentiment lexicon, SentiWordNet. Support vector machine is used to learn the feature weights and an intelligent model selection approach is employed in order to enhance the classification performance. The features are selected based on their subjectivity and the effects of feature selection with respect to their part of speech information are studied extensively. Seven benchmark datasets have been used in this research including large movie review dataset, multi-domain sentiment dataset and Cornell movie review dataset, all of which are available online. In-depth performance comparison is conducted with the state of art machine learning approaches and lexical based methodologies. The evaluation of performance measures proves that the proposed framework outperforms other techniques for sentiment analysis.

[1]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[2]  Girish K. Patnaik,et al.  Analyzing Sentiment of Movie Review Data using Naive Bayes Neural Classifier , 2014 .

[3]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[4]  Erik Cambria,et al.  EmoSenticSpace: A novel framework for affective common-sense reasoning , 2014, Knowl. Based Syst..

[5]  Deyu Zhou,et al.  Self-training from labeled features for sentiment analysis , 2011, Inf. Process. Manag..

[6]  Yuhong Zhang,et al.  Quadruple Transfer Learning: Exploiting both shared and non-shared concepts for text classification , 2015, Knowl. Based Syst..

[7]  Usman Qamar,et al.  TOM: Twitter opinion mining framework using hybrid classification scheme , 2014, Decis. Support Syst..

[8]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[9]  Namita Mittal,et al.  Sentiment Analysis Using Common-Sense and Context Information , 2015, Comput. Intell. Neurosci..

[10]  Jong-Seok Lee,et al.  Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews , 2014, Knowl. Based Syst..

[11]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[12]  K. L. Shunmuganathan,et al.  Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining , 2015, Sci. Program..

[13]  Felipe Bravo-Marquez,et al.  Meta-level sentiment models for big social data analysis , 2014, Knowl. Based Syst..

[14]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[15]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[16]  Felipe Teixeira,et al.  SACI: Sentiment analysis by collective inspection on social media content , 2015, J. Web Semant..

[17]  Katja Markert,et al.  From Words to Senses: A Case Study of Subjectivity Recognition , 2008, COLING.

[18]  Hsinchun Chen,et al.  A Lexicon-Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews , 2010, IEEE Intelligent Systems.

[19]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[20]  Yücel Saygin,et al.  Learning Domain-Specific Polarity Lexicons , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[21]  Chihli Hung,et al.  Using Objective Words in SentiWordNet to Improve Word-of-Mouth Sentiment Classification , 2013, IEEE Intelligent Systems.

[22]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[23]  Namita Mittal,et al.  Sentiment Classification using Rough Set based Hybrid Feature Selection , 2013, WASSA@NAACL-HLT.

[24]  Luis Alfonso Ureña López,et al.  A Spanish semantic orientation approach to domain adaptation for polarity classification , 2015, Inf. Process. Manag..

[25]  Pushpak Bhattacharyya,et al.  Incorporating Semantic Knowledge for Sentiment Analysis , 2008 .

[26]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[27]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[28]  Li Hui-xian Research on analyzing sentiment of texts based on k-nearest neighbor algorithm , 2012 .

[29]  Il-Chul Moon,et al.  Efficient extraction of domain specific sentiment lexicon with active learning , 2015, Pattern Recognit. Lett..

[30]  Josef Steinberger,et al.  Supervised sentiment analysis in Czech social media , 2014, Inf. Process. Manag..

[31]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[32]  Luis Alfonso Ureña López,et al.  Crowd explicit sentiment analysis , 2014, Knowl. Based Syst..

[33]  Alaa Hamouda,et al.  Building Machine Learning Based Senti-word Lexicon for Sentiment Analysis , 2011 .

[34]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[35]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[36]  Arno Scharl,et al.  Enriching semantic knowledge bases for opinion mining in big data applications , 2014, Knowl. Based Syst..

[37]  Franciska de Jong,et al.  Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews , 2013, Knowl. Based Syst..

[38]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[39]  Mark Levene,et al.  Combining lexicon and learning based approaches for concept-level sentiment analysis , 2012, WISDOM '12.

[40]  Prema Nedungadi,et al.  Hybrid Approach for Emotion Classification of Audio Conversation Based on Text and Speech Mining , 2015 .

[41]  Mike James,et al.  Classification Algorithms , 1986, Encyclopedia of Machine Learning and Data Mining.

[42]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[43]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[44]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[45]  Ali Selamat,et al.  Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples , 2015, Inf. Sci..

[46]  Erik Cambria,et al.  Sentic patterns: Dependency-based rules for concept-level sentiment analysis , 2014, Knowl. Based Syst..

[47]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[48]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[49]  Xiaolong Wang,et al.  Hybrid Deep Belief Networks for Semi-supervised Sentiment Classification , 2014, COLING.

[50]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[51]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[52]  Pravesh Kumar Singh,et al.  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques , 2014 .

[53]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[54]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[55]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[56]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..