Building a Vietnamese SentiWordNet Using Vietnamese Electronic Dictionary and String Kernel

In this paper, we propose a novel approach to construct a Vietnamese SentiWordNet (VSWN), a lexical resource supporting sentiment analysis in Vietnamese. A SentiWordNet is typically generated from WordNet in which each synset has numerical scores to indicate its opinion polarities. However, Vietnamese WordNet is not yet available currently. Therefore, we propose a method to construct a VSWN from a Vietnamese electronic dictionary, not from WordNet. The main drawback of constructing a VSWN from a dictionary is that it is easy to suffer from the sparsity problem, since the glosses in the dictionary are short in general. As a solution to this problem, we adopt a string kernel function which measures the string similarity based on both common contiguous and non-contiguous subsequences. According to our experimental results, first, the use of string kernel outperforms a baseline model which uses the standard bag-of-word kernel. Second, the Vietnamese SentiWordNet is competitive with the English SentiWordNet which uses WordNet when it constructed. All those results prove that our methodology is effective and efficient in constructing a SentiWordNet from an electronic dictionary.

[1]  Ronald Fagin,et al.  Comparing and aggregating rankings with ties , 2004, PODS '04.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[4]  Seong-Bae Park,et al.  Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary , 2014, ArXiv.

[5]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[6]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[7]  Sivaji Bandyopadhyay,et al.  Towards the Global SentiWordNet , 2010, PACLIC.

[8]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[9]  Andrea Esuli,et al.  Automatic generation of lexical resources for opinion mining: models, algorithms and applications , 2010, SIGF.

[10]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[11]  Sivaji Bandyopadhyay,et al.  SentiWordNet for Indian Languages , 2010 .

[12]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[13]  Andrea Esuli,et al.  SentiWordNet: A High-Coverage Lexical Resource for Opinion Mining , 2006 .

[14]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[15]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .