Negation handling in sentiment classification using rule-based adapted from Indonesian language syntactic for Indonesian text in Twitter

The presence of the word negation is able to change the polarity of the text if it is not handled properly it will affect the performance of the sentiment classification. Negation words in Indonesian are 'tidak', 'bukan', 'belum' and 'jangan'. Also, there is a conjunction word that able to reverse the actual values, as the word 'tetapi', or 'tapi'. Unigram has shortcomings in dealing with the existence of negation because it treats negation word and the negated words as separate words. A general approach for negation handling in English text gives the tag 'NEG_' for following words after negation until the first punctuation. But this may gives the tag to un-negated, and this approach does not handle negation and conjunction in one sentences. The rule-based method to determine what words negated by adapting the rules of Indonesian language syntactic of negation to determine the scope of negation was proposed in this study. With adapting syntactic rules and tagging "NEG_" using SVM classifier with RBF kernel has better performance results than the other experiments. Considering the average F1-score value, the performance of this proposed method can be improved against baseline equal to 1.79% (baseline without negation handling) and 5% (baseline with existing negation handling) for a dataset that all tweets contain negation words. And also for the second dataset that has the various number of negation words in document tweet. It can be improved against baseline at 2.69% (without negation handling) and 3.17% (with existing negation handling).

[1]  Gregory Grefenstette,et al.  Coupling Niche Browsers and Affect Analysis for an Opinion Mining Application , 2004, RIAO.

[2]  Guodong Zhou,et al.  Negation and Speculation Identification in Chinese Language , 2015, ACL.

[3]  Uzay Kaymak,et al.  Accounting for negation in sentiment analysis , 2011 .

[4]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[5]  Franciska de Jong,et al.  Scope of negation detection in sentiment analysis , 2011 .

[6]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Scoring, term weighting, and the vector space model , 2008 .

[7]  Dietrich Klakow,et al.  A survey on the role of negation in sentiment analysis , 2010, NeSp-NLP@ACL.

[8]  Yacine Ouzrout,et al.  Negation Handling in Sentiment Analysis at Sentence Level , 2017, J. Comput..

[9]  Clement T. Yu,et al.  The effect of negation on sentiment analysis and retrieval effectiveness , 2009, CIKM.

[10]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[11]  Ondrej Bojar,et al.  Resources for Indonesian Sentiment Analysis , 2015, Prague Bull. Math. Linguistics.

[12]  Alok N. Choudhary,et al.  Sentiment Analysis of Conditional Sentences , 2009, EMNLP.

[13]  Michael C. Ewing,et al.  Indonesian: A Comprehensive Grammar , 2010 .

[14]  Uzay Kaymak,et al.  Determining negation scope and strength in sentiment analysis , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .