Twitter Sentiment Analysis in Under-Resourced Languages using Byte-Level Recurrent Neural Model

Sentiment analysis in non-English language can be more challenging than the English language because of the scarcity of publicly available resources to build the prediction model with high accuracy. To alleviate this under-resourced problem, this article introduces the leverage of byte-level recurrent neural model to generate text representation for twitter sentiment analysis in the Indonesian language. As the main part of the proposed model training is unsupervised and does not require much-labeled data, this approach can be scalable by using a huge amount of unlabeled data that is easily gathered on the Internet, without much dependencies on human-generated resources. This paper also introduces an Indonesian dataset for general sentiment analysis. It consists of 10,806 twitter data (tweets) selected from a total of 454,559 gathered tweets which taken directly from twitter using twitter API. The 10,806 tweets are then classified into 3 categories, positive, negative, and neutral. This Indonesian dataset could help the development of Indonesian sentiment analysis especially general sentiment analysis and encouraged others to start publishing similar dataset in the future.

[1]  Francisco Herrera,et al.  Distinguishing between facts and opinions for sentiment analysis: Survey and challenges , 2018, Inf. Fusion.

[2]  Ondrej Bojar,et al.  Resources for Indonesian Sentiment Analysis , 2015, Prague Bull. Math. Linguistics.

[3]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[4]  Finn Årup,et al.  A new ANEW: Evaluation of a word list for sentiment analysis in microblogs , 2016 .

[5]  Verónica Pérez-Rosas,et al.  Learning Sentiment Lexicons in Spanish , 2012, LREC.

[6]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[7]  Thomas Hofmann,et al.  Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification , 2017, WWW.

[8]  Steve Renals,et al.  Multiplicative LSTM for sequence modelling , 2016, ICLR.

[9]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[10]  Rajashree Shedge,et al.  Comparative study of feature extraction techniques used in sentiment analysis , 2016, 2016 International Conference on Innovation and Challenges in Cyber Security (ICICCS-INBUSH).

[11]  L. Ragha,et al.  Featured based sentiment classification for hotel reviews using NLP and Bayesian classification , 2012, 2012 International Conference on Communication, Information & Computing Technology (ICCICT).

[12]  Xianghan Zheng,et al.  Comparison of Text Sentiment Analysis Based on Machine Learning , 2016, 2016 15th International Symposium on Parallel and Distributed Computing (ISPDC).

[13]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[14]  Eduardo R. Hruschka,et al.  A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning , 2016, ACM Comput. Surv..

[15]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[16]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[17]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[18]  Mehrdad Jalali,et al.  Sentiment analysis on Twitter using McDiarmid tree algorithm , 2017, 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE).

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[21]  Rodrigo C. Barros,et al.  A character-based convolutional neural network for language-agnostic Twitter sentiment analysis , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[22]  Kodrat Iman Satoto,et al.  Sentiment analysis on Twitter posts: An analysis of positive or negative opinion on GoJek , 2017, 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE).

[23]  B. K. Tripathy,et al.  Investigation of recurrent neural networks in the field of sentiment analysis , 2017, 2017 International Conference on Communication and Signal Processing (ICCSP).