Feature Reduction Using Standard Deviation with Different Subsets Selection in Sentiment Analysis

The genesis of the internet and web has created huge information on the web, including users’ digital or textual opinions and reviews. This leads to compiling many features in document-level. Consequently, we will have a high-dimensional feature space. In this paper, we propose an algorithm based on standard deviation method to solve the high-dimensional feature space. The algorithm constructs feature subsets based on dispersion of features. In other words, algorithm selects the features with higher value of standard deviation for construction of the subsets. To do this, the paper presents an experiment of performance estimation on sentiment analysis dataset using ensemble of classifiers when dimensionality reduction is performed on the input space using three different methods. Also different types of base classifiers and classifier combination rules were used.

[1]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[2]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[3]  Vikas Sindhwani,et al.  Document-Word Co-regularization for Semi-supervised Sentiment Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[5]  Xiaolong Wang,et al.  Active deep learning method for semi-supervised sentiment classification , 2013, Neurocomputing.

[6]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[7]  Iñaki Inza,et al.  Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers , 2012, Neurocomputing.

[8]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[9]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[10]  Tiejun Zhao,et al.  Chinese Microblog Sentiment Analysis Based on Semi-supervised Learning , 2012, CSWS.

[11]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[12]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[13]  Gérard Dray,et al.  Web opinion mining: how to extract opinions from blogs? , 2008, CSTST.

[14]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[15]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[16]  Huan Liu,et al.  Unsupervised sentiment analysis with emotional signals , 2013, WWW.

[17]  Chu-Ren Huang,et al.  Chinese Lexical Semantics , 2013 .

[18]  Ying Su,et al.  Ensemble Learning for Sentiment Classification , 2012, CLSW.