Word Embedding Comparison for Indonesian Language Sentiment Analysis

Development of information technology makes the production of data increase dramatically. We can get lots of data from the internet, including data reviews about a product or service. The more data obtained, the system is needed to process it. Sentiment analysis is a text processing of Natural Language Processing (NLP) that can help someone to see the quality of service offered, including hotel services. This paper uses hotel review data to carry out sentiment analysis obtained from the Traveloka website. The data classified using the Long Short-Term Memory (LSTM) algorithm. To get better results, the authors use word embedding to convert words into vectors. This study aims to compare the performance of several word embedding, while word embedding compared is word2vec Continuous Bag of Words CBOW, word2vec skip-gram, doc2vec, and glove. From the experiment conducted, the results show that the glove method has the highest accuracy of 95.52% while the word2vec skip-gram model has the lowest accuracy of 91.81%, so it concluded that the glove method is the best word embedding method for hotel review data.

[1]  Jeffery Ansah,et al.  Sentiment Analysis with Word Embedding , 2018 .

[2]  Masayu Leylia Khodra,et al.  Sentiment-specific word embedding for Indonesian sentiment analysis , 2017, 2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA).

[3]  Metin Bilgin,et al.  Sentiment analysis on Twitter data with semi-supervised Doc2Vec , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Matthew England,et al.  Improving Sentiment Analysis in Arabic Using Word Representation , 2018, 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR).

[7]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[8]  Lolla Sravani,et al.  A Comparison Study of Word Embedding for Detecting Named Entities of Code-Mixed Data in Indian Language , 2018, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[9]  Pooja Jain,et al.  Vector representation of words for sentiment analysis using GloVe , 2017, 2017 International Conference on Intelligent Communication and Computational Techniques (ICCT).