Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis

Feature subset selection with the aim of reducing dependency of feature selection techniques and obtaining a high-quality minimal feature subset from a real-world domain is the main task of this research. For this end, firstly, two types of feature representation are presented for feature sets, namely unigram-based and part-of-speech based feature sets. Secondly, five methods of feature ranking are employed for creating feature vectors. Finally, we propose two methods for the integration feature vectors and feature subsets. An ordinal-based integration of different feature vectors (OIFV) is proposed in order to obtain a new feature vector. The new feature vector depends on the order of features in the old vectors. A frequency-based integration of different feature subsets (FIFS) with most effective features, which are obtained from a hybrid filter and wrapper methods in the feature selection task, is then proposed. In addition, four well-known text classification algorithms are employed as classifiers in the wrapper method for the selection of the feature subsets. A wide range of comparative experiments on five widely-used datasets in sentiment analysis were carried out. The experiments demonstrate that proposed methods can effectively improve the performance of sentiment classification. These results also show that proposed part-of-speech patterns are more effective in their classification accuracy compared to unigram-based features.

[1]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[2]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[3]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[4]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[5]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[6]  T. Warren Liao,et al.  Feature extraction and selection from acoustic emission signals with an application in grinding wheel condition monitoring , 2010, Eng. Appl. Artif. Intell..

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[9]  Santanu Kumar Rath,et al.  Classification of sentiment reviews using n-gram machine learning approach , 2016, Expert Syst. Appl..

[10]  Nicoletta Dessì,et al.  Similarity of feature selection methods: An empirical study across data intensive classification tasks , 2015, Expert Syst. Appl..

[11]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[12]  Pramod Kumar Singh,et al.  Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering , 2015, Expert Syst. Appl..

[13]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[14]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[15]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[16]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[17]  Ioannis Korkontzelos,et al.  Reviewing and Evaluating Automatic Term Recognition Techniques , 2008, GoTAL.

[18]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[19]  Kentaro Inui,et al.  Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[20]  Hakan Altinçay,et al.  A novel framework for termset selection and weighting in binary text classification , 2014, Eng. Appl. Artif. Intell..

[21]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[22]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[23]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[24]  Gerhard Weikum,et al.  The Bag-of-Opinions Method for Review Rating Prediction from Sparse Text Patterns , 2010, COLING.

[25]  Jason Baldridge,et al.  Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph , 2011, ULNLP@EMNLP.

[26]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[27]  Roliana Ibrahim,et al.  Feature Reduction Using Standard Deviation with Different Subsets Selection in Sentiment Analysis , 2014, ACIIDS.

[28]  Diego Reforgiato Recupero,et al.  Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool , 2014, IEEE Computational Intelligence Magazine.

[29]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[30]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[31]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[32]  Asif Ekbal,et al.  Combining feature selection and classifier ensemble using a multiobjective simulated annealing approach: application to named entity recognition , 2012, Soft Computing.

[33]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[34]  Roliana Ibrahim,et al.  A comparative study on sentiment analysis , 2014 .

[35]  Roliana Ibrahim,et al.  A Novel Feature Reduction Method in Sentiment Analysis , 2014 .

[36]  Chun Chen,et al.  Opinion Word Expansion and Target Extraction through Double Propagation , 2011, CL.

[37]  Filiberto Pla,et al.  Supervised feature selection by clustering using conditional mutual information-based distances , 2010, Pattern Recognit..

[38]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.