Emotion Classification on Indonesian Twitter Dataset

The rapid growth of Twitter usage attracts many researchers to utilize Twitter data for several purposes, including emotion analysis. However, there is a resource limitation in standard dataset for emotion analysis task for under-resourced language, especially Indonesian. In this study, we build an Indonesian twitter dataset for emotion classification task which is publicly available. In addition, we conduct feature engineering to decide the best feature in emotion classification. The features used in this research are lexicon-based, Bag-of-Words, word embeddings, orthography and Part-Of-Speech (POS)tag features. We test those features in two datasets with different characteristics. F1-score is employed as an evaluation metric. The results of our experiments show that implementing the combination of all proposed features in our built dataset can achieve 69.73% of F1-Score, which outperforms the baseline model by 26.64%.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[3]  R. Chris Fraley,et al.  Structure of the Indonesian Emotion Lexicon , 2001 .

[4]  Fajri Koto,et al.  Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs , 2017, 2017 International Conference on Asian Language Processing (IALP).

[5]  Agus Zainal Arifin,et al.  Emotion Detection of Tweets in Indonesian Language using Non-Negative Matrix Factorization , 2014 .

[6]  P. Shaver,et al.  Emotion knowledge: further exploration of a prototype approach. , 1987, Journal of personality and social psychology.

[7]  Giuseppe Di Fabbrizio,et al.  EMOTION DETECTION IN EMAIL CUSTOMER CARE , 2013, Comput. Intell..

[8]  Mirna Adriani,et al.  Sentiment Lexicon Generation for an Under-Resourced Language , 2014, Int. J. Comput. Linguistics Appl..

[9]  Mirna Adriani,et al.  A two-stage emotion detection on Indonesian tweets , 2015, 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[10]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[11]  David Konopnicki,et al.  Emotion Detection from Text via Ensemble Classification Using Word Embeddings , 2017, ICTIR.

[12]  Thouraya Daouas,et al.  Emotions recognition in an intelligent elearning environment , 2018, Interact. Learn. Environ..

[13]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[14]  Ayu Purwarianti,et al.  Comparison on the rule based method and statistical based method on emotion classification for Indonesian Twitter text , 2015, 2015 International Conference on Information Technology Systems and Innovation (ICITSI).

[15]  Ruli Manurung,et al.  Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus , 2014, 2014 International Conference on Asian Language Processing (IALP).

[16]  W. G. Parrott,et al.  Emotions in social psychology : essential readings , 2001 .

[17]  Parth Vora,et al.  Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers , 2017 .