Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets

This paper describes our attempt to build a sentiment analysis system for Indonesian tweets. With this system, we can study and identify sentiments and opinions in a text or document computationally. We used four thousand manually labeled tweets collected in February and March 2016 to build the model. Because of the variety of content in tweets, we analyze tweets into eight groups in total, including pos(itive), neg(ative), and neu(tral). Finally, we obtained 73.2% accuracy with Long Short Term Memory (LSTM) without normalizer.

[1]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[2]  Enya Kong Tang,et al.  The combined Wordnet Bahasa , 2014 .

[3]  Septina Dian Larasati,et al.  Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus , 2011, SFCM.

[4]  Mirna Adriani,et al.  Automatically Building a Corpus for Sentiment Analysis on Indonesian Tweets , 2014, PACLIC.

[5]  Kathleen M. Carley,et al.  Twitter Usage in Indonesia , 2015 .

[6]  Ondrej Bojar,et al.  Resources for Indonesian Sentiment Analysis , 2015, Prague Bull. Math. Linguistics.

[7]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[8]  Paulina Aliandu SENTIMENT ANALYSIS ON INDONESIAN TWEET , 2013 .

[9]  Alexander Adelaar Structural Diversity in the Malayic Subgroup , 2004 .

[10]  Scott Horan Paauw,et al.  The Malay contact varieties of eastern Indonesia: A typological comparison , 2009 .

[11]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[14]  Francis Bond,et al.  Building an HPSG-based Indonesian Resource Grammar (INDRA) , 2015 .

[15]  Hasan Alwi,et al.  Tata bahasa baku bahasa Indonesia , 1988 .

[16]  James Neil Sneddon,et al.  Colloquial Jakartan Indonesian , 2006 .

[17]  Ruli Manurung,et al.  Building an Indonesian rule-based part-of-speech tagger , 2014, 2014 International Conference on Asian Language Processing (IALP).