Automatically Building a Corpus for Sentiment Analysis on Indonesian Tweets

The popularity of the user generated content, such as Twitter, has made it a rich source for the sentiment analysis and opinion mining tasks. This paper presents our study in automatically building a training corpus for the sentiment analysis on Indonesian tweets. We start with a set of seed sentiment corpus and subsequently expand them using a classifier model whose parameters are estimated using the Expectation and Maximization (EM) framework. We apply our automatically built corpus to perform two tasks, namely opinion tweet extraction and tweet polarity classification using various machine learning approaches. Experiment result shows that a classifier model trained on our data, which is automatically constructed using our proposed method, outperforms the baseline system in terms of opinion tweet extraction and tweet polarity classification.

[1]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Joemon M. Jose,et al.  An interactive interface for visualizing events on Twitter , 2014, SIGIR.

[4]  Alan F. Smeaton,et al.  Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[5]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  MatsuoYutaka,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013 .

[8]  Johan Setiawan,et al.  Using Text Mining to Analyze Mobile Phone Provider Service Quality (Case Study: Social Media Twitter) , 2014 .

[9]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[10]  Mirna Adriani,et al.  Sentiment Lexicon Generation for an Under-Resourced Language , 2014, Int. J. Comput. Linguistics Appl..

[11]  Yutaka Matsuo,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[13]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[14]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[15]  Fei Liu,et al.  Application of a clustering method on sentiment analysis , 2012, J. Inf. Sci..

[16]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[17]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[18]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[19]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[20]  Paulina Aliandu SENTIMENT ANALYSIS ON INDONESIAN TWEET , 2013 .

[21]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[22]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[23]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.