Minimally-Supervised Sentiment Lexicon Induction Model: A Case Study of Malay Sentiment Analysis

Vital to the task of mining sentiment from text is a sentiment lexicon, or a dictionary of terms annotated for their a priori information across the semantic dimension of sentiment. Each term has assigned a general, out-of-context sentiment polarity. Unfortunately, online dictionaries and similar lexical resources do not readily include information on the sentiment properties of their entries. Moreover, manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large volume of research concentrated on automated sentiment lexicon generation algorithms. Most of these algorithms were designed for English, attributable to the abundance of readily available lexical resources in this language. This is not the case for low-resource languages such as the Malay language. Although there has been an exponential increase in research on Malay sentiment analysis over the past few years, the subtask of sentiment lexicon induction for this particular language remains under-investigated. We present a minimally-supervised sentiment lexicon induction model specifically designed for the Malay language. It takes as input only two initial paradigm positive and negative terms, and mines WordNet Bahasa’s synonym chains and Kamus Dewan’s gloss information to extract subjective, sentiment-laden terms. The model automatically bootstraps a reliable, high coverage sentiment lexicon that can be employed in Malay sentiment analysis on full-text. Intrinsic evaluation of the model against a manually annotated test set demonstrates that its ability to assign sentiment properties to terms is on par with human judgement.

[1]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[2]  Rayner Alfred,et al.  Bias aware lexicon-based Sentiment Analysis of Malay dialect on social media data: A study on the Sabah Language , 2016, 2016 2nd International Conference on Science in Information Technology (ICSITech).

[3]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[4]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[5]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[6]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[7]  Rabiah Abdul Kadir,et al.  English and Malay Cross-lingual Sentiment Lexicon Acquisition and Analysis , 2017, ICISA.

[8]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[9]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[10]  Yi-Fei Tan,et al.  Sentiment Analysis for Telco Popularity on Twitter Big Data Using a Novel Malaysian Dictionary , 2016, ICADIWT.

[11]  Enya Kong Tang,et al.  The combined Wordnet Bahasa , 2014 .

[12]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[13]  Halizah Basiron,et al.  Lexical Based Sentiment Analysis - Verb, Adverb & Negation , 2016 .

[14]  Shereena M. Arif,et al.  The Effect of Noise Elimination and Stemming in Sentiment Analysis for Malay Documents , 2017 .

[15]  Abdul Aziz Idris Modality in Malay , 1980 .

[16]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[17]  Abdul Razak Hamdan,et al.  Normalization of noisy texts in Malaysian online reviews , 2013 .

[18]  Paul Kroeger,et al.  External negation in Malay/Indonesian , 2014 .

[19]  R. Burt Models of Network Structure , 1980 .

[20]  Nazlia Omar,et al.  Automatically Generating a Sentiment Lexicon for The Malay Language , 2016 .

[21]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[22]  Saif Mohammad,et al.  How Translation Alters Sentiment , 2016, J. Artif. Intell. Res..

[23]  Dragomir R. Radev,et al.  Identifying the Semantic Orientation of Foreign Words , 2011, ACL.

[24]  Rayner Alfred,et al.  Factors Affecting Sentiment Prediction of Malay News Headlines Using Machine Learning Approaches , 2016, SCDS.