Impact of Feature Selection Techniques for Tweet Sentiment Classification

Sentiment analysis of tweets is a powerful application of mining social media sites that can be used for a variety of social sensing tasks. Common feature engineering techniques frequently result in a large numbers of features being generated to represent tweets. Many of these features may degrade classifier performance and increasing computational cost. Feature selection techniques can be used to select an optimal subset of features, reducing the computational cost of training a classifier, and potentially improving classification performance. Despite its benefits, feature selection has received little attention within the tweet sentiment domain. We study the impact of ten filter-based feature selection techniques on classification performance, using ten feature subset sizes and four different learners. Our experimental results demonstrate that feature selection can significantly improve classification performance in comparison to not using feature selection. Additionally, both choice of ranker and feature subset size significantly impact classifier performance. To the best of our knowledge, this is the first work which extensively studies feature selections effect on tweet sentiment classification.

[1]  Harith Alani,et al.  Alleviating Data Sparsity for Twitter Sentiment Analysis , 2012, #MSM.

[2]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Guillermo Sapiro,et al.  If you are happy and you know it... tweet , 2012, CIKM '12.

[5]  Taghi M. Khoshgoftaar,et al.  First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques , 2012, 2012 11th International Conference on Machine Learning and Applications.

[6]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[7]  Choochart Haruechaiyasak,et al.  Discovering Consumer Insight from Twitter via Sentiment Analysis , 2012, J. Univers. Comput. Sci..

[8]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[9]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[10]  K. Thompson,et al.  If You're Happy and You Know It , 2012 .

[11]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[12]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[13]  Taghi M. Khoshgoftaar,et al.  A Comparative Study of Threshold-Based Feature Selection Techniques , 2010, 2010 IEEE International Conference on Granular Computing.