SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection

Abstract Supervised learning has attracted much attention in recent years. As a consequence, many of the state-of-the-art algorithms are domain dependent as they require a labeled training corpus to learn the domain features. This requires the availability of labeled corpora which is a cumbersome task in itself. However, for text sentiment detection SentiWordNet (SWN) may be used. It is a vocabulary where terms are arranged in synonym groups called synsets. This research makes use of SentiWordNet and treats it as the labeled corpus for training. A sentiment dictionary, SentiMI, builds upon the mutual information calculated from these terms. A complete framework is developed by using feature selection and extracting mutual information, from SentiMI, for the selected features. Training, testing and evaluation of the proposed framework are conducted on a large dataset of 50,000 movie reviews. A notable performance improvement of 7% in accuracy, 14% in specificity, and 8% in F-measure is achieved by the proposed framework as compared to the baseline SentiWordNet classifier. Comparison with the state-of-the-art classifiers is also performed on widely used Cornell Movie Review dataset which also proves the effectiveness of the proposed approach.

[1]  Chao Yang,et al.  Lexical and Machine Learning Approaches Toward Online Reputation Management , 2012, CLEF.

[2]  Andreas Nürnberger,et al.  Supporting Arabic Cross-Lingual Retrieval Using Contextual Information , 2011, IRFC.

[3]  Usman Qamar,et al.  TOM: Twitter opinion mining framework using hybrid classification scheme , 2014, Decis. Support Syst..

[4]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[5]  Il-Chul Moon,et al.  Efficient extraction of domain specific sentiment lexicon with active learning , 2015, Pattern Recognit. Lett..

[6]  Chng Eng Siong,et al.  Modelling Public Sentiment in Twitter: Using Linguistic Patterns to Enhance Supervised Learning , 2015, CICLing.

[7]  Josef Steinberger,et al.  Reprint of "Supervised sentiment analysis in Czech social media" , 2015, Inf. Process. Manag..

[8]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[9]  Luis Alfonso Ureña López,et al.  Random Walk Weighting over SentiWordNet for Sentiment Polarity Detection on Twitter , 2012, WASSA@ACL.

[10]  Reynier Ortega,et al.  SSA-UO: Unsupervised Twitter Sentiment Analysis , 2013 .

[11]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[13]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[14]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[15]  Fermín L. Cruz,et al.  Building layered, multilingual sentiment lexicons at synset and lemma levels , 2014, Expert Syst. Appl..

[16]  Yiqun Liu,et al.  Emotion Tokens: Bridging the Gap among Multilingual Twitter Sentiment Analysis , 2011, AIRS.

[17]  Alexandra Balahur,et al.  Sentiment Analysis in Social Media Texts , 2013, WASSA@NAACL-HLT.

[18]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[19]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[20]  Josef Steinberger,et al.  Creating Sentiment Dictionaries via Triangulation , 2011, Decis. Support Syst..

[21]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[22]  Aoying Zhou,et al.  An information theoretic approach to sentiment polarity classification , 2012, WebQuality '12.

[23]  Pushpak Bhattacharyya,et al.  Incorporating Semantic Knowledge for Sentiment Analysis , 2008 .

[24]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[25]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[26]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[27]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[28]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[29]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[30]  Deyu Zhou,et al.  Self-training from labeled features for sentiment analysis , 2011, Inf. Process. Manag..

[31]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[32]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[33]  Li Hui-xian Research on analyzing sentiment of texts based on k-nearest neighbor algorithm , 2012 .

[34]  Fei Song,et al.  Improving sentiment analysis with Part-of-Speech weighting , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[35]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[36]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[37]  Brigitte Mathiak,et al.  Revised mutual information approach for german text sentiment classification , 2013, WWW.

[38]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[39]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[40]  Tao Xu,et al.  Identifying the semantic orientation of terms using S-HAL for sentiment analysis , 2012, Knowl. Based Syst..

[41]  Girish K. Patnaik,et al.  Analyzing Sentiment of Movie Review Data using Naive Bayes Neural Classifier , 2014 .

[42]  Arvind Kumar Jain,et al.  Analysis and Implementation of Sentiment Classification Using Lexical POS Markers , 2013 .

[43]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[44]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.