An efficient approach for opinion mining from skewed twitter corpus using under sampling approach

Data Mining is an efficient technique for knowledge discovery from existing databases. The existing algorithms performance degrades when applied to the imbalance dataset. The imbalance nature of twitter data set also hinders the process of efficient knowledge discovery. In this paper, we proposed an efficient approach for knowledge discovery from imbalance datasets specifically designed for opinion mining. The proposed Under Sampled Imbalance Data Learning (USIDL) approach uses the unique technique for under sampling the instances from majority subset. The experimental results suggest that the proposed approach performs better than the existing C4.5 algorithm on seven evaluation metrics.

[1]  Goran Nenadic,et al.  Temporal expression extraction with extensive feature type selection and a posteriori label adjustment , 2015, Data Knowl. Eng..

[2]  Jon M. Kleinberg,et al.  The Directed Closure Process in Hybrid Social-Information Networks, with an Analysis of Link Formation on Twitter , 2010, ICWSM.

[3]  Schahram Dustdar,et al.  Interaction mining and skill-dependent recommendations for multi-objective team composition , 2011, Data Knowl. Eng..

[4]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[5]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[6]  G. Magesh,et al.  A Decision Support Approach for Online Stock Forum Sentiment Analysis , 2015 .

[7]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[8]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[9]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[10]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[11]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[12]  Harith Alani,et al.  Alleviating Data Sparsity for Twitter Sentiment Analysis , 2012, #MSM.

[13]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .