Senti‐CS: Building a lexical resource for sentiment analysis using subjective feature selection and normalized Chi‐Square‐based feature weight generation

Sentiment analysis involves the detection of sentiment content of text using natural language processing. Natural language processing is a very challenging task due to syntactic ambiguities, named entity recognition, use of slangs, jargons, sarcasm, abbreviations and contextual sensitivity. Sentiment analysis can be performed using supervised as well as unsupervised approaches. As the amount of data grows, unsupervised approaches become vital as they cut down on the learning time and the requirements for availability of a labelled dataset. Sentiment lexicons provide an easy application of unsupervised algorithms for text classification. SentiWordNet is a lexical resource widely employed by many researchers for sentiment analysis and polarity classification. However, the reported performance levels need improvement. The proposed research is focused on raising the performance of SentiWordNet3.0 by using it as a labelled corpus to build another sentiment lexicon, named Senti-CS. The part of speech information, usage based ranks and sentiment scores are used to calculate Chi-Square-based feature weight for each unique subjective term/part-of-speech pair extracted from SentiWordNet3.0. This weight is then normalized in a range of -1 to +1 using min-max normalization. Senti-CS based sentiment analysis framework is presented and applied on a large dataset of 50000 movie reviews. These results are then compared with baseline SentiWordNet, Mutual Information and Information Gain techniques. State of the art comparison is performed for the Cornell movie review dataset. The analyses of results indicate that the proposed approach outperforms state-of-the-art classifiers.

[1]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[2]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[3]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[4]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[5]  Andrés Montoyo,et al.  SSA-UO: Unsupervised Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[6]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[7]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[8]  Desheng Dash Wu,et al.  Business intelligence in risk management: Some recent progresses , 2014, Inf. Sci..

[9]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[10]  Desheng Dash Wu,et al.  A Decision Support Approach for Accounts Receivable Risk Management , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[11]  Tao Xu,et al.  Identifying the semantic orientation of terms using S-HAL for sentiment analysis , 2012, Knowl. Based Syst..

[12]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[13]  Deyu Li,et al.  A feature selection method based on improved fisher's discriminant ratio for text sentiment classification , 2011, Expert Syst. Appl..

[14]  G. Paltoglou Sentiment Analysis in Social Media , 2014 .

[15]  Aoying Zhou,et al.  An information theoretic approach to sentiment polarity classification , 2012, WebQuality '12.

[16]  Chihli Hung,et al.  Using Objective Words in SentiWordNet to Improve Word-of-Mouth Sentiment Classification , 2013, IEEE Intelligent Systems.

[17]  Fei-Yue Wang,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009 .

[18]  Desheng Dash Wu,et al.  A Decision Support Approach for Online Stock Forum Sentiment Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[19]  Yulan He,et al.  A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection , 2010, CoNLL.

[20]  Luis Alfonso Ureña López,et al.  Random Walk Weighting over SentiWordNet for Sentiment Polarity Detection on Twitter , 2012, WASSA@ACL.

[21]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[22]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[23]  Deyu Zhou,et al.  Self-training from labeled features for sentiment analysis , 2011, Inf. Process. Manag..

[24]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[25]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[26]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[27]  Hsinchun Chen,et al.  A Lexicon-Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews , 2010, IEEE Intelligent Systems.

[28]  Filiberto Pla,et al.  Supervised feature selection by clustering using conditional mutual information-based distances , 2010, Pattern Recognit..

[29]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[30]  Girish K. Patnaik,et al.  Analyzing Sentiment of Movie Review Data using Naive Bayes Neural Classifier , 2014 .

[31]  Shrikanth S. Narayanan,et al.  Fuzzy Logic Models for the Meaning of Emotion Words , 2013, IEEE Computational Intelligence Magazine.

[32]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[33]  Desheng Dash Wu,et al.  Using text mining and sentiment analysis for online forums hotspot detection and forecast , 2010, Decis. Support Syst..