Part of speech features for sentiment classification based on Latent Dirichlet Allocation

The input data used in the sentiment analysis process by using machine learning generally is Bag of Word (BoW). However, the input data using BoW is not enough to improve the machine learning in defining the polarity in a document. Therefore, need input in the form of more specific feature so that it is capable to give the more maximal result. Part of Speech (POS) is one of the techniques to create the more specific feature in a document. By using the POS-based feature in a document, then the occurrence of the word class like adjective or negation can be detected. The adjective and negation are the main sign of the sentiment or opinion in a document. This study is aimed to use POS technique to conduct feature selection. The result of the POS-based feature process will be the input for sentiment analysis process by using Latent Dirichlet Allocation (LDA) method. The result of this research showed that the document which has passed the POS-based feature process can give accuracy score higher with the difference about 7.8% than the document without feature selection process of POS.

[1]  Retno Kusumaningrum,et al.  Integrated visual vocabulary in latent Dirichlet allocation–based scene classification for IKONOS image , 2014 .

[2]  Nick Bassiliades,et al.  Ontology-based sentiment analysis of twitter posts , 2013, Expert Syst. Appl..

[3]  Muhammad Shahid,et al.  Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques , 2016, J. King Saud Univ. Comput. Inf. Sci..

[4]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[5]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[6]  Ruli Manurung,et al.  Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus , 2014, 2014 International Conference on Asian Language Processing (IALP).

[7]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[8]  Retno Kusumaningrum,et al.  Classification of Indonesian news articles based on Latent Dirichlet Allocation , 2016, 2016 International Conference on Data and Software Engineering (ICoDSE).

[9]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[10]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Qigang Gao,et al.  An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[13]  R Kusumaningrum,et al.  Latent Dirichlet Allocation (LDA) for Sentiment Analysis Toward Tourism Review in Indonesia , 2017 .

[14]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[15]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[16]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.